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SADDLEPOINT APPROXIMATIONS IN STATISTICS' 
By H. E. DANIELS 


University of Cambridge and University of Chicago 


1. Introduction and summary. It is often required to approximate to the 
distribution of some statistic whose exact distribution cannot be conveniently 


obtained. When the first few moments are known, a common procedure is to 
fit a law of the Pearson or Edgeworth type having the same moments as far as 
they are given. Both these methods are often satisfactory in practice, but have 
the drawback that errors in the “tail’’ regions of the distribution are sometimes 
comparable with the frequencies themselves. The Edgeworth approximation in 
particular notoriously can assume negative values in such regions. 

The characteristic function of the statistic may be known, and the difficulty 
is then the analytical one of inverting a Fourier transform explicitly. In this 
paper we show that for a statistic such as the mean of a sample of size n, or the 
ratio of two such means, a satisfactory approximation to its probability density, 
when it exists, can be obtained nearly always by the method of steepest descents 
This gives an asymptotic expansion in powers of n ' whose dominant term 
called the saddlepoint approximation, has a number of desirable features. The 
error incurred by its use is O(n’) as against the more usual O(n”) associated 
with the normal approximation. Moreover it is shown that in an important class 
of cases the relative error of the approximation is uniformly O(n~') over the 
whole admissible range of the variable. 

The method of steepest descents was first used systematically by Debye for 
Bessel functions of large order (Watson |17|) and was introduced by Darwin 
and Fowler (Fowler [9|) into statistical mechanics, where it has remained an 
indispensable tool. Apart from the work of Jeffreys [12] and occasional isolated 
applications by other writers (e.g. Cox [2]), the technique has been largely ig 
nored by writers on statistical theory. 

In the present paper, distributions having probability densities are discussed 
first, the saddlepoint approximation and its associated asymptotic expansion 
being obtained for the probability density of the mean # of a sample of n. It is 
shown how the steepest descents technique is related to an alternative method 
used by Khinchin [14] and, in a slightly different context, by Cramér [5]. Genera] 
conditions are established under which the relative error of the saddlepoint 
approximation is O(n’) uniformly for all admissible Z, with a corresponding 
result for the asymptotic expansion. The case of discrete variables is briefly dis- 
cussed, and finally the method is used for approximating to the distribution of 
ratios. 


2. Mean of n independent identically distributed random variables. Let x 
be a continuously distributed random variable with distribution function F(z). 


Received 1/16/53, revised 3/31/54 
1 Research carried out partly under sponsorship of the Office of Naval Research 
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Assume that a density function f(z) = F’(x) exists and suppose the moment- 
generating function 


«0 
M(T) = &” = | e'*f(x) dx 
-_ 

converges for real 7’ in some nonvanishing interval containing the origin. Let 
—c, < T < & be the largest such interval, whereO0 Sc; S ~ andOS@S « 
but ¢, + c. > 0. Thus either c,; or c, may be zero, though not both, and the 
moments need not all exist. 

Consider the mean & of n independent z’s. Its density function f,(#) = F,’(Z) 
is given by the usual Fourier inversion formula 

n f” 


(2.1) f.(Z) = 5 / M* (ite a 
T «o 


e t 
(More generally / may be replaced by lim... , but the argument is unaf- 
“2 


—¢ 


fected.) It is convenient here to employ the equivalent inversion formula 


‘+100 


T 
‘ n (K(T)—T r 
(2.2) / gikD—TA ap 
T 1 


2n1 0 


where —c; < G(T) < c on the path of integration, and K(7’) is the cumulant- 
generating function. 

When n is large, an approximation to f,(Z) is found by choosing the path of 
integration to pass through a saddlepoint of the integrand in such a way that 
the integrand is negligible outside its immediate neighbourhood. The saddle- 
points are situated where the exponent has zero derivative, that is where 


(2.3) K'(T’) = 2. 


We shall prove in Section 5 that under general conditions (2.3) has a single 
real root 7’) in (—c, , ¢2) for every value of such that 0 < F,(#) < 1, and that 
K’’(T>) > 0. Let us choose the path of integration to be a straight line through 
7’) parallel to the imaginary axis. Since K(7') — TZ has a minimum at 7> for real 
T, the modulus of the integrand must have a maximum at 7> on the chosen 
path. Now we can show by a familiar argument (cf. Wintner [18], p. 14) that 
on any admissible straight line parallel to the imaginary axis the integrand attains 
its maximum modulus only where the line crosses the real axis. For on the line 


Tort wy, 


|M(T)eT*| =e? | et? F(z) 


Se *M(r). 


x 
Equality cannot hold for some y # 0, otherwise [ e"*™* dF(x) = M(r)e'* 
- oo 
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so that [ e*[1 — cos (yx — a«)| dF(x) = 0, which contradicts the existence of a 


° . ° ° . 1 ° 
density function. Moreover, since M(r + iy) = O(\ y| °) for large | y| by the 
Riemann Lebesgue lemma, the integrand cannot approach arbitrarily near its 
maximum modulus as | y | becomes large. Consequently, for the particular path 
chosen, only the nieghbourhood of 7’) need be considered when n is large. 
The argument then proceeds formally as follows. On the contour near 7% , 
K(T) — T% = K(T ) — Tot — 3K" (Toy 

(2.4) 3 4 

, a Al river 4 

—VeR’' (Ty) iy + Maag-K'(To)y + 


Setting y = v/[nK’’(7>)]'” and expanding the integrand we get 


1/2 
felt) ~ | ayes | ett (tor—ten 
y ae ylt y 
P 2r K (To) 
\ 


x f - 3 } 
' I 1 — M6 Ma(To) a + . [44 a(To)o* — ¥42d4(To)o"] + +> dv 


where A,(7) = K°(T)/[K"(T)|’" for 7 2 3. The odd powers of v vanish on 


° ° ° . . ° | 
integration and we obtain an expansion in powers of n , 


\ 


( (italy ) 
(2.6) fu(@) ~ gn(@) 41 + . [16 A(T) — 544 (To) + «++ > 


(= f ‘ , ” /2 n[K (T9)—Toz r 2 ° 
where g,(#) = [n/2eK"(T>)]' eX lK(70)-Te#l We call g,(#) the saddlepoint ap 
proximation to f,(#). 


3. The method of steepest descents. It is not apparent from the above formal 
development that (2.6) is a proper asymptotic expansion in which the remainder 
is of the same order as the last term neglected. The asymptotic nature of an 
expansion of this type is usually established by the method of steepest descents 
with the aid of a lemma due to Watson [17], the path of integration being the 
curve of steepest descent through 7’) , upon which the modulus of the integrand 
decreases most rapidly. An account of the method is given by Jeffreys and 
Jeffreys (13]. The analysis is simplified by using a “truncated” version of Watson’s 
lemma introduced by Jeffreys and Jeffreys for this purpose.? The special form 
appropriate to the present discussion is as follows. 

Lemma. Jf ¥(z) is analytic in a neighbourhood of z = 0 and bounded for real 

winan interval —-A S w S Bwith A > Oand B > 0, then 


19 


[ en" *L(w) dw ~ ¥(0) 


1 1 y?(0) , 
_. f/"( os 
on ¥ 0) + + (Qn)" rr! 7 


° =f 
is an asymptotic expansion in powers of n ~. 


a 


2? The proof given in [13] contains an error which will be corrected in the forthcoming 
new edition. 
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‘To apply the lemma, deform the contour so that for | 7 — 7T)| S 6 the line 
T = To + iy is replaced by the curve of steepest descent which is that branch 
of g{K(T) — Tz} = 0 touching 7 = 7T>) + zy at 7), when 6 is chosen small 
enough to exclude possible saddlepoints other than 7’) . The contour is thereafter 
continued along the orthogonal curves of constant ®}|A(7') — T#!. These can 
easily be shown to meet the original path in points 7) — ia and T,) + 78 where 
a > Oand 8 > 0, if 6 is small enough, since 7 is a simple root of (2.3). The rest 
of the contcur remains as before. 

On the steepest descent curve, K(7’) — TZ is real and decreases steadily on 
each side of 7). Make the substitution 


—4hw’ = K(T) — T% — K(To) + Tf 
4K'(T)(T — To)’ + Y%K''(T.)(T — To)’ + 
he? + LEra(To)z’ + Wa-da(To)z* + 
where z = (T’ — 1)[K’(7»)|'", and w is chosen to have the same sign as J(z) 
on the contour. Inversion of the series yields an expansion 
z= iw t Yr (Tow + [a-A(T) — 542d3(T0) iw’ + 


convergent in some neighbourhood of w = 0. The contribution to (2.2) from 
this part of the contour is then 


n{K(To)—To#] eB 
n e aw2/2 AZ 
e€ — dw, 
A 


21 (K’’(T>)]! , dw 


to which Watson’s lemma can be applied. Contributions to the integral from the 
rest of the contour are negligible since for 7’ = 7’) + iy with y outside (— a, , a) 
we have 


M(T)e"™* | s p| M(To)e** 


for some p < 1, so that the extra terms contain the factor p" and may be neg- 
lected. We thus obtain the asymptotic expansion 


1/2 { . 

acs +. n n(|K(T9)—To# ay de 

(3.3) f,.(#) ~ caren ( ome lag ea + + se dz 
2rK"'(T 5) \ n nr? 


From the Lagrange expansion of dz/dw we find 


et. er 


(3.4 a, = —<~ ? ° 
, 2’r! dz \1w(z)) 2—0 


The coefficients of this series can be shown to be identical with those obtained 
by the method of Section 2 (see Appendix). 


4. A generalisation of the Edgeworth expansion. We now show how the work 
of Cramér [3], [4] on the Edgeworth series can also be employed to establish the 
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asymptotic nature of (2.6), using a technique similar to that adopted by Cramér 
{5} and Khinchin [14]. 

It has been proved that on any admissible path of the form 7’ r+ iy the 
integrand attains its maximum modulus only at 7 = +r. Consequently (2.6) is 
only one of a family of series for f,(#) which can be derived in a similar way by 
integrating along 7’ = r + iy, 7 taking any value in (—c,, ¢c). In particular, 
r = 0 gives the Edgeworth series, whose asymptotic character was demonstrated 
by Cramér (3). 


We have 


(4.1) ar «a | ” sti f(x) dz = | oe" flu + #) du. 


On the path 7 = + + iy we can put 


where 
ee e™ flu + F) du 
oly) = 
| ev“ flu + #) du 
is the characteristic function fer a random variable u having the density function 


Tu 


h(u) « &™ flu + #). The inversion formula (2.2) then becomes 


f(#) = &**" “\.(n/2n) [ ¢"(y) dy 


_ ore, 


where h,(a@) is the density function for the mean @ of n independent u’s. Using 
the fact that 


das ; r (iy)? 
log @ = [K"(r) — aliy + DO K(x) =” 
j22 J: 
we may replace h,(0) by its Edgeworth series and obtain the family of asymp 
totic expansions 


~ exp n{K(r) — 7% — [K’(r) #)°/2K"'(r)} 


(4.3) 


‘[n/2eK""(r)|'? {1 + Ay/n' 
where 
(1/3!)A3(7) Hz ([K’(7) F\[n/K"'(r)] 
(1/4!)Ag(r) Hy(({K’(7) #\[n/K’'(r)]" 
+ (10/6! A3(r) Hy({K’'(r) — #l[n K''(r)|! ), 


etc., the H’s being Hermite polynomials. 
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When 7 = 0 this reduces to the Edgeworth series for f,(). (Since c; or c2 can 
be zero it may not be possible to take the expansion beyond a certain number of 
terms in this case). On the other hand when r = 7), so that K’(7')) = &, all 


the odd powers of n “~ vanish and we get (2.6), which is an expansion in powers 


] . = . 
of n . In particular the dominant term g,(Z) has the same accuracy as the first 


two terms of the Edgeworth series. Unlike the latter, however, g,(Z) can never 
be negative, and is shown in Section 7 to have a further important advantage 
over the other approximations. 


5. Examples. The method is applied to three examples. 
EXAMPLE 5.1. 


; l 
J r) Ps \/ 2m ( 
K(T) = mT + }40T', K"(T) =m+aT 


T. (¢ — m)/o’, K"(T.) =o’, 


L/n : 
Jn\) 
a \2r 
In this case g,(%) = f,(#) for every value of n. 
EXAMPLE 5.2. f(z) (c"/T(a))x* 


—a log (] 


Jn\2£) (ra 
The exact result is 
4,(#) [(ne)"* I'(na) |Z _ 


which differs from g,(Z) only in that (na) is replaced by Stirling’s approxima- 
tion in the normalising factor. As this can always be readjusted ultimately to 
make the total probability unity, we can regard g,(Z) as being in this sense 
“exact”’ for all n. 


EXAMPLE 5.3. f(x) = }, 


The density function fer the mean of m independent rectangular variables in 
(—1, 1) is known to be 


2"(n 


where (z) = zforz 2 O and =0 for z < 0. (Seal [16] gives a historical note on 
this result.) We have 
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/ sinh =) 
P To 


K(T) _ log { T K'(T») = coth To — 


Z, 


aii l “ 
K"(T,)) = TT cosech’ 7'5, 
; 


a n\"? { 2 ad afi + 
g(t) = 5 T cosech’ 7'y) ( 1 ) e 
- . ( i 0 


/ 


When 7> is large and positive, # ~ 1 — 1/7» and 


» ie Fo jem reese mm 2 
K(T>) ~ log (e°°/2T%5), K"(T,) ~ 1/7, . 


So for small 1 — Z, 


~ ‘ 1/ 
Gn(Z) ~ (n/2mr) 


which agrees with f,(#) = [n"/2"(n — 1)!](1 — #)"" when Z > 1 2/n except 
for the normalising constant, and there is similar agreement for # near —1. 
Actually log. g,(#) is remarkably close to log, f,(#) for quite moderate values of 
n over the whole range of x. Table 1 shows the agreement for n 6, which could 
be improved by adjusting the normalising constant. With » as low as 6, g,(#) 


never differs from f,(Z) by as much as 4 per cent. This example leads one to 


TABLE 1 


loge fe(Z) 0.419 | 0.172 0.249 —0.860 1 .687 2.778 1.216) —6.243 
log. ge(Z) 0.445 | 0.199 0.221 0.829 1.653 | —2.742 4.188 —6.228 
Difference 0.026 | 0.027 0.028 0.031) 0.034) 0.036 0.028 0.015 


enquire under what conditions f,(@)/g,(#) —> 1 uniformly for all J asn— ~, so 
that the relative accuracy of the approximation is maintained up to the ends of 
the range of z. In Section 7 we show that the result is true for a wide class of 
density functions. 


6. The real roots of K’(7) é. In this section we discuss the existence and 
properties of the real roots of the equation K’(7’) = £, upon which the approxi- 
mation g,(#) depends. The conditions are here relaxed so that the distribution 
need not have a density function. The moment generating function is still 
assumed to satisfy the conditions of Section 2, namely that 


M(T) = &*"" = / e” dF(z) 


a2 


converges for real JT in —c; < T < c whereO0 Sc, S ~ andO S mS @ but 
c; + « > 0. Throughout this section 7’ is supposed to take real values only 

The distribution may extend from —* to ~, or it may be limited at either or 
both ends. We shall write 
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F(z) = 0 


’ 


0 < F(z) < 1 


F(z) =], 


where if desired a rorh x, or both. Note that b < « impliesc = x 
so that c < x implies b x, and similarly for a and ¢c,;. The converse is not 
true since b and c, (or a and ¢,) can both be infinite. 

We now establish the conditions under which K’(7') £ has no real root 
when & lies outside the interval (a, b), and has a unique simple root 7’) for every 
Ein (a, b). It is convenient to consider first the ease where both a and b are 


finite 


a 


‘THeoreM 6.1. F(x) = O fora a, and F(x) 1 forx > bif and only if K(T) 
exists for all real T and K’'(T) £ has no real root whenever —§ < aoré& > b 


PROOF. Write 


nx 
Wy K(T)—TEé T(r-t ’ 
V(7 4 §&) =e ” = | ( dk (x). 


“2 


If dF (a Q outside (a, b) then M(T’, &) exists for all real 7’ and 
dF (2x) 


exists and has constant sign for all T when — < aor é > b, and K’ é has 
then no real root. 

Conversely, suppose A(7') exists for all 7 and K’(T) £ has no real root 
when & < aoré > b. Then M’(T, &) has constant sign in the domains — < a, 

» <1 < ~ andé > B, mn <7 < x so that M(T7; &) is monotonie in 
7 for these values of & 

Moreover M(T', —) must increase with 7' for all & < a, and decrease with T 
for all € > b. For if M(T’, &) increases with 7’, then d/’(x) 0 for every x < &, 
otherwise M(—«, & *» and if this were true for all & > 6 we should have 
dF (x) 0 for all x. Similarly 17(7’, &) cannot decrease with 7' for £ a, 

a, dF (x) 0 for alla < &, that is F(a QO tor 
F(x) 1 foralla > b 


Hence when 


c 
s 
In the same way 
THroreM 6,2. Let F(x Oforxr<a, O< F(x) << lfora<cxr<b, F(x) 


1 for b < x, where » £6e<0)« 2. Then for every & ina < & < b there 


wa unique simpli rool TT) of K’(T’) E As T increases from i me 42? 
increases continuously from & ato€& b, 

Proor. When a < &§ < b, M'(—~@, &) 2 and M’(x, ¢ ©, and 
W'(T, &) is strictly increasing with T since W’’(7') > 0. So for each — in a 
£ < b there is a unique root 7 of M’(T, ¢ 0 and hence of K’(7 £. Also 
Kor M"'(To, &) M(Ty,, &) so that 0 < K’'(T ) < x, and 7) is a simple 
root and A’(7'o) is a strictly increasing function of 7p. 

For all 77, M’(T, b) < Oand so K’(T) < b, but M’(T, b )— o as 7’ 
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for every « > 0 so that K’(T) > b — e for all sufficiently large 7. Hence 
K'(T) — bas T — ~~. Similarly K’(T) + a as T — —~. This also implies 
K"(T)-0asT> +2. 

The theorem has an obvious interpretation in terms of the family of conjugate 
distributions (the term is due to Khinchin [14]) 


dF (x, T) = Ce™*dF (x) 


which have mean K’(7') and variance K’’(7’). 

A complication arises when a and 6 are allowed to be infinite. Suppose for 
example that a is finite but b = ~, so that A(T) exists in — x < T < cy where 
OS@S «~. Ife = ~, then K’(T) — «~ as T— @ and the theorems still hold, 
for however large £ is, M’(T, —) —~ ~ as T — « and so K’(T) > € for all suffi 
ciently large 7’. 

But if c < © the corresponding theorems do not hold without a further 
condition, for it is not necessarily true that K’(7T') — « as T — c,. Consider the 
class of distributions 


dF (xz) = e °” dG(r) 


© 00 00 
where [ dG(x) = m < @ and [ rdG(x) = m < @, but / e* dG(x) = 
a a “a 

for all « > 0. Here K’(7’) increases from — ~ to m,/mo as T' increases from — « 
to c., but A’(T) = @~ forall T > «.Soforé > m/m, K’'(T) = & has no 
real root though the distribution may extend to ~. 

The case a = —® can be discussed similarly. In the general case where K(7') 
exists in —c; < 7’ < cand a and b may be infinite, the conditions 


(6.1) lim K’(T) = b, lim K’(T) = 
T-—¢3 T-—c 

are required for every & in (a, b) to have a corresponding 7» in (—¢, , ¢2). They 
will be automatically satisfied except when a or b is infinite and the corresponding 
¢, Or ¢y is finite, in which case the appropriate condition has to be stated ex- 
plicitly. But even when (6.1) is not satisfied the approximation g,(#) and the 
expansion (2.6) can still be used whenever # lies within the restricted range of 
values assumed by K’(7'). 


7. Accuracy at the ends of the range of x. We return to the distributions 
having a density function, and examine the accuracy of g,(#) and the expansion 


(2.6) for values of Z'near the ends of its admissile range (a, b), where the ap 
proximation might be expected to fail. It is assumed that the appropriate con- 
ditions hold for K’(7') = & to have a unique real root:7') for every & in (a, b). 

It has been proved that 


(7.2) f_ AE g.(Z) - 1 |< A(#)/n, 
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where A(Z) may depend on Z since it is a function of 7). The family of expan- 
sions (4.3) provides similar inequalities, and in particular an inequality of type 
(7.1) holds for symmetrical distributions when g,(Z) is replaced by the limiting 
normal approximation to f,(Z). But it is well known that the relative accuracy 
of the normal approximation, and of the Edgeworth series generally, deteriorates 
in most cases as Z approaches the ends of its range. For example, if the interval 
(a, b) is finite and f,(#) — 0 as  — a or b, what corresponds to A(Z) in (7.1) 
becomes intolerably large as z approaches a or b, since the normal approximation 
can never be zero. 

We now show that for a wide class of distributions g,(Z) satisfies (7.1) with 
A(%) = A, independent of Z, as approaches a or b. In fact, for such distribu- 
tions the asymptotic expansion of f,(€)/g,(Z) given by (2.6) is valid uniformly as 
E — aor b. This will be so if 4,;(7') remains bounded as T'—+ —c, or cz for every 
fixed j, so we examine the behaviour of \,(7') near the ends of the interval. Equiv- 
alently, we study the conjugate distributions with density function 


(7.2) f(z, T) = Ce™* f(x) 


whose jth cumulant is K’’’(7'). The form of f(z, T) as T' approaches —c, or ¢2 
depends on the behaviour of f(z) as x approaches a or b. For the commonest 
end conditions on f(x), it will appear that f(z, 7) approximates either to the 
gamma form of Example 5.2 or to the normal form as 7’ —+ —c; or c, . In the first 
case \,(7') is bounded for given j; in the second case 4,;(7') + 0 so that g,(2Z), 
for any n, becomes progressively more accurate as Z — b, its relative error tend- 
ing to a limiting value which is of smaller order than any power of n 

We begin by discussing distributions with b = ~ and first consider asymp- 
totic forms of f(a) when z is large for which f(z, 7) approximates to the 
gamma form. 


’ = . al! c 
KXXAMPLE 7.1. f(z) ~ Azr*e ™, a>0,c>0. 


Let Y be large. Then 


MT) = [xe pe) de= +1; 


x 
where /; = | x’e™*f(x) dx is bounded as T — ¢, and for small ¢ — T, 


x 4 o 
ba T é He 
| i gg ds ee pee | wit e—” dw 
x (c — [')i+ X(e—T) 


~A rj + a) ‘(ec — 57", 


Thus 


(7.3) 
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for every j. In this case f(z, T) tends to the gamma form as 7' — c. The result is 
in fact a familiar Abelian theorem for Laplace transforms, and a more general 
form of it (Doetsch [7] p. 460) can be restated for our purpose as follows 

Tueorem 7.1. Let f(x) ~ Ax™ 'U(x)e for a > 0 and c > 0, where U(x) is con- 
tinuous and i(kx)/l(z) + lasx— @ for every k > 0. Then, as T — ¢, 


MO(T) ~ A i fan! (. iz ) and = -(T) ~ a?” 
This enables us to include end conditions of the form Ax*" log xe or Ax* 
log log x e “, etc. In all such cases f(z, 7’) tends to the gamma form as T' — c. 

In the second class of end conditions f(z, 7) approximates to the normal 
form for limiting values of 7'. We first consider heuristically some typical exam- 
ples, again with b = « 


IXAMPLE 7.2. f(x) ~ A exp (Bx“ — cz), B>Vv,e>0,0 <2 <1. 


Here we might expect A,(7') — 0 as T -—> c, for when c — T is small the dom 
inant part of f(x, 7’) lies in the region of large x where 


f(z, T) ~ CA exp (Bx* — (c — T)z). 


rr: . . my ll a— . . , 
This has a unique maximum at x» = [a8/(e — T)| “’ which is large for 


small ¢ T. If we put x = zoy the corresponding density for y isc’ exp [Sz¢ 
(y“ — ay)| which has a sharp maximum at y = 1, near which it approximates 
to the normal form ec” exp [—38a(1 — a)xs(y — 1)’]; it is relatively negligible 
elsewhere. 


EXAMPLE 7.3. f(x) ~ A exp (—8z"), B>O,a> 1. 
In this case T' can be indefinitely large. We again expect \,;(7) ~ 0 as T — ~, 
for 

f(z, T) ~ CA exp [—Bx* + Tz] 


m/9\We-t 
has a unique maximum at z = (7'/af) - 


’ which tends to infinity with 7’; 
with x = zy the density for y becomes c’ exp [8x¢(y* — ay)|, which approx 
imates to c’’ exp [—}8a(a — 1)xo(y — 1)’] as before. 

‘hese examples are included in the following general theorem concerning end 
conditions of the type f(z) ~ &", where a*h’’(x) + ~ asx — &. Subject toa 
restriction on the variation of h’’(z) it is shown that \,(7') — 0 in such cases as 
T tends to its upper limit. 

TuHrorem 7.2. Let f(z) ~ e*® for large x, where h(x) > Oand 0 < h’’(zx) « 
Let v(x) and w(x) exist such that 


(i) [v(x) Ph" (x) > « (ii) eh" (x) 0 


monotonically as x — «©, where 


’ 


v(x) > 0, v(x) c wx) = fa v(r)) dx 


Then X(T >Oas T tends to it: upper limiting value 
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Examples 7.2 and 7.3 are covered by v(x) = x/y for some y > 0, conditions 
(i) and (ji) reducing to a°h’'(4) — «, and « "h’(x) — 0. For h(x) = e* one can 
take v(x) = 4, for h(x) = e* take v(x) = }¢°7, and so on. In all cases v(x)/z is 
bounded and w(z) increases at least as fast as log z. 


Since 0 <h 


ry 


r) < ~,h’(x) is strictly increasing and h’(x) cc S # aszt— 
a > , m Tr—h( ° . 

«. Thus for large x, f(z, T) ~ Ce* ”” has a single maximum at the unique 
root 2 of h’(ay) = T, wherexy— ~asT—cs @~. 


The jth moment of f(z, 7’) about 2» is 
nD 
uj(T) = c| (x — ao) f(x)e"* dz. 


It will be shown that as 7’ — c the major contribution to the integral comes 
from within a range xy + e(z 9) where e is arbitrarily small. Consider first the 


behaviour of v(x) and w(z) in this interval. Since | v(x) | < aasa— « we have 


for large zo and |x — 2| S e(2o), 


v(2Xq) 


Also for some 2, in (x, 20), 


wa) a — 29)w'(x,;) = (x — 20) /v(a,) 


so that for | x | < ewv(2Xo), 
(7.5) w(x) w(t) 


Let X be large, but choose 7' so that 7» > XY + 7/7. Then 


x 
uj(T) ~C | (a 2, ao) *f(x)e"* dx 


+ (+ | + / + / > [(2 — a)’ exp [xh’ (2x9) h(x)| dx 


ot+ev (29) ) 


I,+i,¢4 I; + I, 


say. We examine the magnitude of each term as T' — e. 
' T: . a , 
Since (a x)’e* has its maximum at (z» — j/T) > X, 


I, | < Clap — x)'eT™* F(X) < Cxrie™™@ 


For J», 
7.6) 


where we write P(x, x 
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the theorem, when x S x, 


h’’(x) ev) wh! (x9) > 0 
and for x S xo — e(2>), 
Zo 
h'(20) — h'(x) > hi" (x) ae 


Zo—ev(z9) 


= nv(xo)h’’ (x0), 


« (lmae 


where 7 = & , by (7.5). So W(x, xo) > nv(ay)h’’(a)(ao — 2), and from 
(7.6), 
I | < Cet’ 20)—*e0) [qv(ao)h’’ (ao))?™*. 
For J;, 


f 


Zo zo+«v(29)) 
y_roh! (r9)—h/ J .z0) 
I; = Ce” “* od | + | s(x — a) ’e VS dex 
z 


\ Zo—ev (zo) 0 / 
vy zoh’ (x9)—h (29) 
= Ce ’ {J + Jo}, 
say. When 2 — ev(xo) S x S 2X we have from (i) and (ii), 


(7.7) e020) = hi (x)/h'" (x0) S [v(x0)/v(x)} 


and so from (7.4) and (7.5), 


e(l 


th’ (xo)(x — xm) e “" S W(x, %) S Fh" (x(x — ao) (1 + ae)’ 


Putting u = (2 — 2xo){h’’(x0)]'” in J, makes the lower limit of integration be- 
come — ev(x){h’’(xo)|''*, which tends to — « as x» — ~ for fixed e, by (i). Hence 


2 TG + 1/2] 


Ji;~(- 
1 ) [h’’ (a) Jor? 


{1 + Ofe)}. 


In the range x» S x S 2% + ev(xo) the inequalities (7.7) are reversed and 
th!’ (ao)(x — ao) (1 — ae) S (x, m) S Bh’ (ao)(x — ayer 


so that 


gf) "TIG 1)/2] 
LO -+- ( 4 
Js [h” (x9) ]o*/? (1 0 €)} 


Hence if j is even 


roh’ (x9)—h (29) (27)! s 
nC orn oot (2n)"*{1 + O(€)} 


[h’’ (xo) 
while if j is odd 


oh’ (r9)—h (29) 


e ‘ 
a mms , ()(¢) 


} noe , 5 
3 [h” (a9) or /2 
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For I, . 


"* 
Toh Z h(zg ( j z,%Q) 
I, =e | is — ae” Ga. 
Zo+€v(2o) 


The inequality h’’(x) 2 [v(2»)/v(x)|"h’'(ao) > 0 shows, as with J, , that 


Cet’ ro)—h (29) 5 


h< J: 


fe(1 — cee)v(ae)h” (ao) |’? 


We now show that /; is the dominant term. First let 7 be even. As T — c, 
both J,/J,; and 14/1, are Of [v"(ao)h’’(ao)| 7" } and so — 0 for fixed ¢. Further, 


j+1)/2 A(X x, 
IT, \/Ts - ro{h’’ (xo)} ¢ ee 


ro 


From (ii), h’’(a») < e” as x» becomes large. Also since v(x)/z is bounded, (i) 
implies that (2 — X)h’’(x)v(x) — and so for all large enough x 


A0 5 


W(X, 2%) = | (2 — X)h" (x) dx > A | (1/o(xz)) dx = Alw(xo) — w(X)} 
x x 


whatever A > 0. Thus 


I, ; Oj} exp \7 log ii [A (I+ 1) w(x li 


which tends to zero as 7’ — ¢ for fixed X if A is large enough, since w(x) increases 
at least as fast as log zx. 
It follows that for even J, 


C roh’ (r9)—h( 29 
e 


Th” (a ) ris 


\ 


u(T) ~ 


2 (23)! 
“—s [h’’(2)] die ae 


' 


since po(7') 1. Similarly when 7 is odd, 


uj(T) ~ [hh (x)}?" Ole 


as 7’ — c, so the odd moments can be made relatively negligible for arbitrarily 
small «. Thus the moments tend to those of the normal distribution and \,(7') 

>Oas T —. 

Turning now to the case where x S b < & we consider forms of f(x) when 
b — x is small. Again there are found to be two classes of end conditions for 
which A,;(7’) is bounded as 7’ — «~, where f(z, 7') tends respectively to the 
gamma and to the normal form. It is convenient to put u = b — 2 and regard 
(—)’K°’(T) as the jth cumulant of the distribution of u with density f(b — u, 
T’) Be ™F(b u) foru = O 


EXAMPLE 7.4. 





SADDLEPOINT APPROXIMATIONS 


The jth moment of u about its origin is 


2 


B | 


e 8 2 

ue “f(b — u) du~ BA | ute ™ du + B | ue “f(b — u) du 
Jo i 
I'(a + Jj) 
ey 
where 6 is small, the remainder being O(e~”’) for large 7’. It follows that \,(7’) 
~ (—)’a***. As in Example 7.1 this is a well known result on Laplace trans- 
forms, and its more general form (Doetsch [7], p. 476) yields the following 
theorem. 


~ BA T— @, 


THEOREM 7.3. Let f(x) ~ A(b — 2)*"U(b — x) for a > 0, where l(u) is continu- 
ous and [l(ku)|/l(u) — 1 as u — 0 for every k > 0. Then ;(T) ~ (—)’al*”. 

For example l(u) could be log (1/u) or log log (1/w), ete. 

The second class of end conditions is typified by the following example. 


EXAMPLE 7.5. f(x) ~ A exp [—8/(b — x)”], B>O,y7 > 0. 
As in Example 7.2 we expect \;(7’) + 0 as T — «, for 
Ce-**f(b — u) ~ CA exp [— Tu — B/u"] 
has a unique maximum at uw = (@y/T)""*”, and the density function for 
y = u/u is 
C’ exp [—Bua (yy + y~7)] ~ C” exp [—48yv(7 + 1)u (y — 1)' 


The general theorem analogous to Theorem 7.2 is: 
TuHeEoreM 7.4. Let f(x) ~ &* for small b — x, where h(x) > 0 and 0 < h’’(z) 
< ,. Let v(u) and w(u) exist such that 


(i) [vo(b — x)Ph"(x) + ©, ~— (ii) e”A"(z) + 0 


’ 


monotonically as x — b, where v(0) = 0 and w(u) = f[l/v(u)] du, and 0 < 
v'(u) Sa < @ foru > 0. Thend,/(T) ~ 0 a8 T > ~. 


As before h’(x) is strictly increasing, and h’(x) — « as x — b since (i) implies 
> — x) h"(x) — ~&, and h'(a) = T has a unique root x» where 2 — b as T' — 
«©. Thus f(b — u, 7) has a unique maximum at uw = b — 2 for large 7’, and 
up + O0as T — «. The jth moment of u about wu is 


uj(T) = B | (u — uw) ’e ™ f(b — u) du. 


0 


We write 


»ug—ev(ug) ug+ev (ug) [ ‘ 
%o—ev(ug) ug+ev(ug) 4 


where ¢ and 6 are small. The proof then proceeds with appropriate modifications 
as in Theorem 7.2. 





646 H. E. DANIELS 


8. Discrete variables. The discussion has so far been concerned with ap- 
proximations to probability densities, but the saddlepoint method provides 
similar approximations to probabilities when the variable is discrete, and indeed 
it is typically used for this purpose in statistical mechanics. Consider, for ex- 
ample, a variable x which takes only integral values x = r with nonzero proba- 
bilities p(r). The moment generating function, 


(8.1) M(T) = eX” = >> pire” 
r 
is assumed to satisfy conditions (6.1). 


The mean & of n independent 2’s can take only values = r/n, for which the 
probabilities are 


1 t’+inr ae ; 
(8.2) palZ) = 5 -f er ae 
a7 Tir 


analogous to (2.2). The contour is again chosen to be the line T = 7) + iy 
passing through the unique real saddle point 7) , but it now terminates at 7’) + 
ix. This ensures that the integrand attains its maximum modulus at 7’) but 
nowhere else on the contour, provided we exclude cases where p(r) = 0 except 
at multiples of an integer greater than unity. The discussion of Section 2 shows 
that the maximum modulus is attained when y satisfies cos (ry — a) = 1 for some 
a and all integral r, and y = 0 is the only possible value in (—7, wr). The argu- 
ment then proceeds as before and leads to the approximation 
n(K(To)—T of) 


(8.3) pald) ~ (QxnK(T)] 


=a {1 + O(n™)} 


where — = r/n and r is an integer. 
As an example, consider the binomial! distribution 


p(r) = (") (1 — p)'p” 


K(T) = N log {1 + ple7 — 1)}, K"(To) = Npe™®/{1 + ple™® — 1)] = Z, 


j 


Here 


To 


eT = [#/(N — #)|-[(1 — p)/p], K"(To) = #(N — #)/N, 
n™ + 2 (1 oa ea 


(2xn)'/? (N —_ F) N—2)+1/2 Fnb+i/2 


Pr(t) ~ {1 + O(n™")}. 
This is the familiar intermediate formn obtained on replacing the factorials by 
Stirling’s approximation before passing to the normal limit. 


9. Ratio of sums of random variables. The saddlepoint technique can also be 
applied to the distribution of ratios. Cramér (4) has shown that if x and y are 
two independent random variables with densities f,(x) and f2(y) and character- 
istic functions @(t) and @2(u), and if y = O, the density function for r = 2/y is 
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given by 


[sks 
(9.1) f(r) = = [ oi(t)o2(—rt) dt 


provided y has a finite mean. (Gurland [11] relaxes this condition by introducing 
principal values. Cramér states the condition differently and appears to require 
unnecessarily that x shall have a finite mean also.) Cramér deduced the result 
from the distribution of x — ry for fixed r, but it also follows on applying Parse- 
val’s theorem to the formula 


(9.2) fir) = | filryfa(ydy dy 


where y must have a finite mean to make ¢2(—rt) the Fourier transform of iyf.(y). 
In terms of cumulant generating functions (9.1) takes the form 


f(r) ont =; [ ek ilhrksl -rT) K:(—rT) dT. 


Let x, 22, °**, t, and yy, Ye, ***, Yn be independent random samples from 
these distributions, their sums being X and Y. The density for R = x/y is then 
r’+io 
fy ng(R) -* | ee oe a. 
Qat Irie 

When nm, and n, are large, an approximation is found by passing the path of 
integration through a saddlepoint 7) of the exponential part of the integrand, 
given by 


(9.3) mKi(To) — mRK2(—RT) = 0 


Assuming conditions (6.1) to be satisfied, both Ki(7’) and K:(7’) are increasing 
functions of T taking every admissible value of X and Y respectively as T varies 
over its appropriate interval, so that to every FR there is a single real root 7’) of 
(9.3). (However, it is possible for the same 7’) to correspond to more than one 
value of R, since TK3(T) is not necessarily monotonic and so d7)/dR may change 
sign). Proceeding as before, expanding K2(—RT’) also, we obtain an asymptotic 
expansion whose dominant term is 


(R ne K( — RT e217 192 RT) 
er | Qr[n Kt’ (To) + no R?Ks (—RT>)}|' 


. ° 1 + . 
the remainder being relatively O(n”) where n = mip (n; , 72). 


EXAMPLE 9.1. fi(zx) = Ay ¥ 2 fely) - Aw” \, Bay 


where x , y, a , 8 , a, 8 are all positive. In this case 


T, = L (n2 a2 81 a m 11 B2) 


R- (man + Nea) 
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The approximation is found to be 


+ “1/2 ) 1 
a7 Be 292 (ns a1 + Ne ae eo nqa2 R™™ 


ni. (R = enemas : , 
g 1+%2 ) -(Qr)" 2(my a4) "171 “1/2 (11 xg) "2° 2- 1/2 (AR + Bo) "121 *"2%2 


which differs from the exact density function only in the normalising constant, 
and so is “‘exact’’ in the sense of Example 5.2. This suggests that there may again 
be a class of distributions for which the relative error is bounded uniformly over 
the whole range of FR for every n. 

An extension of (9.1) is available when the variables are not independent 
(Cramér [6] p. 317, ex. 6; Geary [10]). If (2, y) has a bivariate density function 
f(x, y) everywhere and characteristic function ¢(t, uv), and if y = 0, the density 
function for r = z/y is 


won | 


provided the integral is absolutely convergent, which requires y to have a 
finite mean. The following proof of (9.4) shows the integrand to be proportional 
to a characteristic function which attains its maximum modulus only at t = 0 
so that the previous methods are applicable. Corresponding to (9.2) we have 


« 


(9.5) fir) = | fry, yy dy. 


“0 


> 


Write » = E(y) and define a new distribution with density and characteristic 
function 

1 g(t, u) 

iu 


(9.6) h(x, y) = : yf(z,y) o(z,y) =- 
n ” 


From (9.5) it is seen that (1/n)f(r) can be regarded as the probability density at 
zero of the variable w = x — ry, where (2, 1) has the distribution (9.6). The 
result then follows from the fact that w has the characteristic function 


1 [ age, u) 
” ou u= ?. 


For a random sample of n, the ratio R of the sums X and Y has density 


1’ +i0 
f.(R) = a =| ark (T.-RT) | - > aie ae | ar 
1—t00 4 


in terms of the bivariate cumulant generating function. The saddlepoint approxi- 
mation is 


f n in Ee aK(T, pat) 
(R) = es. a, 0» 9 
In R) \ ork rK"(T  —hT) é T, ommmal aR 
where 


dK (T>, —RT») —RT>) 


= (). 
0 _— 
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EXAMPLE 9.2. Let x = $u° and y = 40’, where u and v have a bivariate nor- 
mal distribution with unit variances and correlation coefficient p. Thus R = X/Y 
is a “variance ratio” calculated from two equal correlated samples. The exact 
distribution of R has been given by Bose [1] and Finney [8]. We find 


K(T, —RT) = } log {1 + (R — 1)T — RT*(1 — p’)}, 


(R — 1) 


as 4 : 4R(1 — p) 
2R(1 — p’)’ 


K" (To, —RT> eS - — - 2» 
/= (a + R) — 4pR] 


—10K(T,,-—RT) _ (R+1)0 — 9p) 


To aR [((1 + R)? — 4°R]’ 


Pe: ial ¢ | pe a 2) 1] de R) 
A(R) = 2” (2) p)” 
- 2 [((1 + R)? — 4p°R] er? 


a 


which again agrees with the exact distribution except for the normalising con- 
stant. 

In the most general situation where the sample members are themselves cor- 
related, the saddlepoint method can still be applied. In each particular case the 
contribution to the integral from parts of the contour outside a neighbourhood of 
the saddlepoint must be established as negligible. One can obtain, in this way an 
approximation to the distribution of the sample serial correlation coefficient of 
lag 1 from a linear Markoff population. With the usual “circular” definitions it 
turns out to be the approximation given by Leipnik [15], but a similar approxima- 
tion can also be found for the noncircular case. A detailed account of this work 
will appear elsewhere. 


10. Acknowledgments. I am much indebted to D. V. Lindley and Sir Harold 
Jeffreys for many stimulating discussions and useful suggestions, to a referee for 
comments which led to improvements in the paper, and to D. A. East for com- 
puting Table 1. 


11. Appendix. The identity of the series (2.6) and 3.3) may be established as 
follows. For the contour T = 7’) + iy the inversion formula is 


2 
n ni K ( ; 2/2 
fn(#) ein 5 elk To9)—To#) [ e nu dy 
— 00 


a7 


where w’ is defined by (3.2). With v = y[nK’’(T,)|'" and s = n~"” this becomes 


1 ? 1/2 ~*~ 
t n(|K(To)—Toz# —w2(ivs)/20? 
= — ara e”! 0) 04) / e w*(ive)/2e dv 
2r K (To) — 2 


with z = ivs in (3.2). To get (2.5) the integrand is expanded as a power series in 
s. Term-by-term integration gives (2.6). Thus 


Ps 19) o 
exp ie : ee = 7 b,,.(v) 8” 
i 8° 


L - m==() 
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where 


1 a” w (ivs) 
=— —- exp| — —; 
m! ds” 2s? ill 


Lic vw (iz) 
=—v -——exp| —- ——. 
m! da” 22° z—0 


Since w(ix)/x° ~ 1 + O(x), for small x we can interchange the order of differ- 
entiation with respect to x and integration with respect to v. Only the even terms 
survive and 


“ ae ot vw (iz) 
I. bao) dv = oi dae I. . exp| - pd a 


(Qe 1/2 er x 2r+1 AD 
= ) - -— = (2r)'"a,, 
w(ix) ont 


2'r! dx 


putting z = ix in (3.4). 
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A GENERAL THEORY OF DISCRIMINATION WHEN THE INFORMATION 
ABOUT ALTERNATIVE POPULATION DISTRIBUTIONS 
IS BASED ON SAMPLES 


By C. RADHAKRISHNA Rao 


Indian Statistical Institule 


1. Introduction. The problem of discrimination, that is of assigning an ob- 
served individual to its proper group, admits a simple solution when the distribu- 
tions of measurements in the alternative populations are completely specified. 
Research in this direction originated with the use of the linear discriminant func- 
tion introduced in 1936 by Fisher [3]. In 1939 Welch [24] showed that a general 
discriminant function in the case of two alternatives is the likelihood ratio of the 
two hypotheses, and is deducible either from Bayes’ theorem with given a priori 
probabilities or by the use of a lemma by Neyman and Pearson {11} when the 
errors for the two hypotheses are minimised in any given ratio. 

A general theory of decision functions when the alternatives are finite or in- 
finite was developed by Wald [19] in 1939 and further generalized by him in 1949 
[23]. In 1945 von Mises [9] obtained, in the case of a finite number of alternatives, 
the solution to the problem of minimising the maximum error, which is the 
general theme of Wald’s work. Explicit solutions of Bayes’ form, with given a 
priori probabilities or ratio of errors for the alternative groups, and the construc- 
tion and use of a doubtful region were discussed by the author [13] in 1948. Re- 
lated problems and the extension to problems of selection have been treated in 
a subsequent series of papers [15], [16]. 

In all these cases the alternative population distributions are assumed to be 
completely specified. The decision rule consists in setting up a correspondence 
between values observed in a sample and the alternative population distributions. 
lin practice it is rarely possible to specify completely the distributions, but they 
may be estimable on the basis of independent samples from each of the alterna- 
tive distributions. 

Let S,,--- , Sy be independent samples from k alternative populations which 
may be partially specified, as when the functional forms of the probability densi- 
ties are given but with unspecified parameters, or completely unspecified. After 
a sample S is drawn from a population known a priori to be one of the above set 
of k populations, the problem is to infer from which population the sample S 
has been drawn. The decision rule should be in the form of associating S with 
one of the samples S,, --- , S;, and declaring that S has come from the same 
population as the sample with which it is associated. 

The usual practice is to estimate the alternative distributions on the basis of 


the sample information, and to use them in the solution which is strictly applica- 
ble when the alternatives are completely specified. This is probably the right 


Received 8/6/53, revised 1/22/54. 


651 





652 C. RADHAKRISHNA RAO 


approach when estimation is based on large samples. Fix and Hodges [4] have 
shown that this procedure is consistent under certain conditions, that is, with 
probability tending to unity it gives the same results as when the alternatives 
are known, provided the sample sizes are indefinitely increased. This procedure 
can be shown to be asymtotically the best in the sense of Wald [20]. 

No systematic attempt seems to have been made to offer solutions for finite 
samples. Wald [22] proposed to solve this problem in the case of two alternatives 
by obtaining the distribution of the estimated likelihood ratio or the linear 
discriminant function of Fisher. Even if the distribution problem is satisfacto- 
rily solved, it cannot be applied in practice since it involves unknown parameters. 

In this paper some general methods have been developed with the help of 
which the discrimination problem can be solved, utilizing only the sample infor- 
mation. This theory is immediately applicable when the alternative distributions 
have given functional forms but with unspecified parameters. The nonpara- 
metric cases can be treated im a similar manner, but no attempt has been made 
in this paper to offer explicit solutions. 


2. Statement of the problem. Let pi(z | 6), --- , pe(x | O&) be k probability 
densities with known functional forms but unknown parameters. In the represen- 
tation of the function p(x | @), a stands for all the measurements and @ for all 
the unknown parameters. We have, in general, to deal with p-variate populations 
so that x stands for a vector of p stochastic variables. Samples of sizes n; , --- , N% 
are available from these k populations. The observations from the 7th popula- 
tion, fori = 1,--- , k, are denoted by 


(2.1) S;: wi = (ti;, °° , 25s), ; see on 


An individual known a priori to belong to one of the k groups has the measure- 
ments 


(2.2) S: zm (°° * 5 Be) 


The problem is to assign this individual to its proper group on the basis of the 
information supplied only by the observations (2.1) and (2.2), without making 
any assumption about the unknown parameters. The problem is similar when, 
instead of p measurements on a single individual, the sample S in (2.2) consists 
of p measurements on each of n individuals drawn from that population. The 
problem is to decide on the population from which S has arisen, using the in- 
formation supplied by S and S,, --- , S, of (2.1). 


3. Some observations on the solution when the parameters are known. If the 
a priori probabilities of the observed individual belonging to the k groups are 
™,°**, ™, then the Bayes’ solution which minimises the errors of wrong 


classification is to assign individuals with measurements z to the 7th group if 
mrpi(x , 0;) has the highest value in the set 


(3.1) migr(e | 0), -°* , wepr(x | 0). 
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The solution which assigns the individual to the ith group if a,p,(x | @;) has 
the highest value in the set 
(3.2) aypi(x | 01), +--+ , Aepe(x | A) 
has the property of minimising the frequencies of wrong classification for the 
various groups in a ratio determined by the procedure (3.2). This ratio is a 
function of a ,--- , a ; if possible, the constants may be chosen for any speci- 
fied ratio [13]. 

When m , --- , m are unknown or when the consideration of a priori proba- 
bilities is irrelevant, we have to depend on solution (3.2). One method is to choose 
the constants such that the errors are in an equal ratio, using the criterion of 
minimax [9], [23]. Another method is to choose a; = 1 for all 7, using the principle 
of maximum likelihood. The latter method gives an unbiased division of the 
space, that is, the probability with respect to the density p; of all observation 
points assigned to the 7th population is the highest for 7 = 7. All Bayes’ solutions 
do not have this property except in the case of two alternative populations. Also, 
it is not evident whether the minimax solution is always unbiased in the above 
sense. Some criterion has to be developed for the choice of a rule from the sub- 
class of Bayes’ solutions which are unbiased. 


4. Large sample theory. The observations (2.1) and (2.2) considered in Section 
2 can be represented by a point in a space of (nm + mz + --- + nm + 1)p or more 
generally of (nm, + me + --- + me + n)p dimensions. Every division of the space 


into k regions R, , ---, Ry provides a decision rule, by which the ith population 
is accepted when the points fall in the corresponding region RF, . 

The probability of correct classification 6; for the ith group is the density of 
the region R; when the last observation (2.2) arises from the ith group. If the 
a priori probability that the last observation belongs to the ith group is , , then 
the probability of correct classification is 


(4.1) mB; +--+ + mie. 


This is obviously less than 73; + --- -+ 2.8; , where 8; are the values associated 
with the solution (3.1) when all the parameters are known a priori and samples 
do not provide any additional information. 

Expression (4.1) is a function of the unknown parameters (0, , --- , 0) and of 
the division D of the space of N = (nm, + ne -+ --- + m + 1)p dimensions. This 
function is denoted by fy(D, #1, --- , &) or simply by f,(D, 0). Let Lw(D; , De) 
represent the least upper bound of the difference fy(D,; , 6) — fw(De, 0) corre- 
sponding to two divisions D, and D» . 

Following a concept due to Wald [20] we define a sequence of divisions D* to 
be asymptotically best if there does not exist any other sequence D such that 


(4.2) lim sup Ly(D, D*) > 0. 


nije 
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If there exists a sequence of divisions D, such that 
(4.3) Sn D, , 9) — mB +--+: + wiBe asn;- @ 


uniformly in the parameters as the sample sizes individually tend to infinity, 
then such a sequence automatically satisfies the criterion (4.2) for being best 
asymptotically. Fix and Hodges [4] have shown that for the solution 


(4.4) Rj: ripdx\6;) = wyp;(zx | 6;), j=l1,--: 


’ ’ 


where 6,,---, 6 are uniformly consistent estimates of the parameters the 
probability of correct classifications, tends uniformly to m8; + --- + mG, as 
each sample size tends to infinity, provided the probability densities satisfy some 
mild regularity conditions. This result, together with property of uniform con- 
sistency of maximum likelihood estimates (true under some general conditions 
stated by Wald, [21], provides a method of constructing an asymptotically best 
solution of the type (4.4). 


5. Smail sample theory. Let us first consider the problem of two alternative 
groups. There are nm observations from the first group, nz from the second, and a 
single observation (each observation means a set of p measurements) on an indi- 
vidual whose group is unknown. If 6; and 6, are the parameters for the first and 


second groups, then the parameters applicable to the three sets of observations 
are 


Hy: (A, , OB. , 1) 


when the individual belongs to the first group and 
H.: (0, , O2 , Oe) 


otherwise. The two alternative hypotheses from which one is to be chosen on the 
basis of observations are, therefore, the vectors (0; , 92 , @:) and (0; , 02 , 0), what- 
ever #; and 6) may be. 

5.1. Test for Hy against Hy, at a fixed significance level. Let us choose one of 
these hypotheses (say /7,) as null and test it against the alternative H, . For this 
we need critical regions in the space of (ny + ne + 1)p observations which are 
similar with respect to the parameters 6; and 6, under the hypothesis (@, , 42 , 42). 


Out of these, one which maximises the power with respect to the alternatives 


(0; , 02 , 0;) is to be chosen. How far this method yields successful results may be 
judged by a simple example. 

Let pi(x | ;) and po(x | 6) be univariate normal probability densities with 
unknown mean values 6; and 4. and unit standard deviation. From each popula- 
tion n observations are taken; the mean values are found to be #, and Z# . Ac- 
cording to the null hypothesis, the last observation x belongs to the second group. 
In this case 


(x + n®)/ (1 + n) 
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are sufficient for 6, and 6.. The critical region similar with respect to @,; and 2 has 
a conditional size a on the surfaces of constant values of 7 and 7» 

If to these statistics is added 7; = x — #, then it is necessary to consider 
only the conditional distribution of 7; given 7, T.. In fact, Ts is distributed 
independently of 7; , 7. under both hypotheses and has the densities proportional 
to 

f ) , 


as , 9 =e 7 
XP (T';3 — 0; — 02)" > T3), 
line p 8") Pisgery ** 


whose ratio is independent of the observed values from the first group. 


4 


The test derived above is the same as that for testing whether the observation 
x comes from the second group when the alternatives are unspecified. The situ- 
ation is somewhat unfortunate in that the test does not utilize the information 
given by the second sample. Perhaps it is inevitable, if we have to come to 
decisions independently of any a priori knowledge restricting to a fixed signifi- 
cance level. This, however, suggests an intuitive approach to the problem of 
classification. 

Suppose that it is possible to test the hypothesis that the individual! belongs to 
a specified group, say the ith, (ignoring the fact that the alternatives are confined 
to a finite number about which we have some information) at any given proba 
bility level of rejection, and that all the critical regions corresponding to different 
probability levels are well ordered, the bigger containing the smaller. We define 
by &, the least probability level at which the ith hypothesis can be rejected. The 
k groups supply k values & , --- , &, and it appears to be a reasonable rule to 
assign the individual to the jth group if £; is the maximum in the set. The opti- 
mum properties of this rule will naturally depend on the nature of tests of the 
above hypotheses, but this is generally applicable in situations where reasonable 
tests exist. 

Consider for example the univariate case where the k samples provide the 
averages %,,--- , & based on sizes n,,--- , m and pooled variance s based 
on (Son, — k) degrees of freedom. If x is the observation on an individual to be 
classified, we calculate the probabilities 

t; = P{ |t)| > lc — #| /sV/1 + I/n,}, 
where the variable t’has Student’s distribution based on (don — k) degrees of 
freedom. The individual is assigned to that group for which £; is a maxiraum. 


This rule is immediately applicable since it involves no new technique. Only a 
reasonable test should exist and the probability integral table should be avail- 
able. It is, however, not easy to say what optimum properties are implied by this 
rule, except that errors are less for groups with larger sample sizes. 


Another intuitive method which may yield fruitful results is to use fiducial] 
probability distributions if they exist (as defined by Fisher [2]) of the observation 
x. Corresponding to k groups we can set up the k alternative fiducial distributions, 
using the samples. These distributions are parameter-free and the problem now 
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reduces to the classical case of assigning the observation x to one of k populations 
whose distributions are completely defined. It would, however, be somewhat 
difficult to study the optimum properties of this procedure. 

In the following we will lay down a few postulates concerning the nature of 
the decision rule, and obtain solutions which have optimum properties when the 
alternative hypotheses are close to one another. 

5.2. A general postulate concerning the decision rule. Let us denote the prob- 
ability density of the observations from the ith group by 


P(x" | 0:) = pi(xi| 0) --- pila, | 9), 


For simplicity we shall consider only nonrandomised decision rules which need a 
division of the sample space of (nm; + --- + m,% + 1)p dimensions into mutually 
exclusive regions R, , --- , R, . The rule of behaviour is to accept the hypothesis 
that the individual belongs to the ith population when the sample point falls in 
R;. 

In developing the arguments we shall choose the case of two alternative 
populations only, the conclusions being the same for several. In this problem 
there are two regions R, and Rk, . The proportion of errors committed when the 
individual belongs to the first group is 


0,(0; , 02) = | 


R 


Py(a' | 0) Pax" | 62) pr(x | 0:) do. 


Similarly for the other group, 


a2(0, , 02) = | Py(x' | 6:) Pela" | 02) pola | 02) do. 
Ri 


Suppose that we need a decision rule for which the linear compound of errors 
(5.2.1) mya (A; 7 62) + goal, , A.) 


is a minimum. The compounding coefficients m and m2: may be assigned a priori 
probabilities, or suitable weights may be attached to the errors. If there exists a 
division of the space which minimises (5.2.1) irrespective of the true values of 
the parameters, then such a division cannot be improved upon. The minimum 
value of (5.2.1) for any given values 6; and 6, of the parameters is attained for 
the regions 


Ri: mpilx| 0) = mope(x | 62), Re: mope(x | 62) = mipi(x | 1). 


If the boundary of these regions is independent of the parameters @; and @, , then 
we have a uniformly best division of the space. In such a case the sample observa- 
tions do not provide any additional information. 

If we exclude such special cases, it would appear that whatever may be the set 
of regions offered it will not be uniformly the best for all values of the unknown 
parameters and can be good only in some restricted sense. We need then some 
reasonable postulates governing the choice of a decision rule. 
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An obvious requirement on the decision rule is that it should not lead to con- 
tradictions or give recognizably bad results in particular cases. Let us consider 
the degenerate case when the two alternative distributions are identical, that is, 
6, = 6, = 6. For any division Rk, , R2 of the space, the errors committed for the 
two groups in this situation are a,(6@, @) and a2(6, 6), with the necessary condition 
a;(0, 8) + a2(@, 6) = 1. When the population distributions are equal the only 
rule is to assign individuals at random, subject to a given or a chosen frequency 
of errors for the two groups. It seems therefore reasonable to postulate that 
a,(6, 0) and a(@, 6) should be constant independently of the common values of 
the parameters 6; and 6. . 

Further, let us imagine that for a given division of the space the value of 
(6, 6) at a neighbouring value (@ + 66) is more, implying that 


a a \ 
19 2 (0, , 0 — a(6, , 6 = 66 b) 
50; (0; , 62) + 30s ay\0) Pi (a+ 


is positive, or if 60 is positive the expression within the brackets is positive. The 
value of a;(@, 6) at the value 6 — 66 is a(@, 0) — 60(a + b), which is smaller 
than a;(6, @). 

Since we have assumed continuity of the functions involved, throughout a 
neighbourhood (over a square) around the point (6, 6), ai(@, , 4) lies between 
a(0, 6) + 60(a + b)/2. Consequently, throughout this square around (6, @), 
a,(0, , 02) exceeds the value a;(@ — 66, 6 — 60) at the neighbouring point. It is 
clearly undesirable that more errors are committed when the populations are 
different than when they are equal in any given region including the line of 
equality (at least as a boundary) in which the possible values of (6, , 6.) are re- 
stricted to lie. A necessary condition for this is that (a + 6) should vanish at all 
points on the line of equality, implying that o(6, @) and therefore a.(6, @) should 
be constant independently of the common values. 

We are not, at the moment, demanding that the functions a;(@;, 4) and 
a(0; , 62) should be stationary or that they should be absolutely minimum on 
the line of equality, although these appear to be desirable properties leading to 
unbiased divisions of the space. It is, however, necessary that a,(@, 6) should be 
constant independently of @. In our arguments, we have explicitely used one 
parameter although we said that @ stands for a vector of parameters. This is 
clearly admissible since we can consider variations in one parameter keeping 
the others fixed. 

The restriction that a,(0, @) is constant on the line of equality implies that 
with respect to the probability density of observations 


P,(z' | 0) P2(x’ | 0) p(x | 0) 


that is, when 6; = 6. = 6, the regions R; and R; are similar to the sample space 
with respect to the free parameter 6. In such a case we shall say that there exists 
a similar division of the sample space with respect to 6. 


Having determined similar divisions, we have to select the best one among 
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them. It is hard to imagine that there exist regions which minimise uniformly 
any linear compound of the errors may(6; , 02) + mea2(6;, 6) except in some 
special cases. Some suitable criteria have to be used, as in Section 6, depending 


on the type of difficulties which the probability densities may present, to obtain 
reasonable solutions. 

We have yet to consider the nature of the error functions on the line of equality 
where the maximum error for any group cannot be reduced below 50 per cent. 
It may be reasonable to impose the restriction 


a (9, I) = a2(8, 6) = (0.50 


In some problems the actual specification of the ratio of errors a;(0, @) / a2(@, 6) 
may be left open, and chosen to satisfy some optimum conditions. We could im- 
pose any other restriction specifying the ratio of errors at any value of the 
set (0, , 6 

A special case is the choice a.(6, 8) = 0.05, which leads to a test of significance 
of the null hypothesis H,, that the observed individual belongs to the second 
group, against the alternative that he belongs to the first group. This will be 
useful in further subdividing the regions R,; and R: in such a way that some 
portions lead to more confident classifications, while other portions permit only 
provisional decisions. Further theory is developed in the examples considered 
in the next section. 

The arguments of this section can be extended to the case of more than two 
alternative populations. The division of the space into k regions must be such 
that the error committed for any group remains constant whenever the popula- 
tions are identical, whatever may be the common values of the parameters. As 
in the case of two populations, we may choose this constant to be 1/k for each of 
the alternative populations. Also, any ratio of these constants may be specified, 
or sometimes suitably determined. Problems of tests of significance may be con- 
sidered in a similar way. 

The general postulate laid down in this section can be used in the solution of 
a wide variety of problems in classification. For instance, the problem of the 
greater mean (Bahadur and Robbins, [1]) admits a neat solution once this condi- 
tion is imposed. 


6. Some optimum conditions and derivation of decision rules. [t is known 
(Neyman, {10]) that similar regions can be constructed, when a set of sufficient 
statistics exist, by considering the relative probability density of the observa- 
tions, given the set of sufficient statistics. Lehmann and Scheffé [8] have shown 
recently that when the parameters admit a minimal set of sufficient statistics 
such that no function depending on them has zero expectation (in which case the 
set is said to be complete) then all similar regions have Neyman’s structure. 
That sufficient statistics possess this unicity property under some conditions has 
been formally demonstrated by the author [14]. In the illustrations considered 
in this paper, these results are used without proof. 
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If T stands for the complete set of sufficient statistics for 6, then we can write 
down the joint densities of the observations under H, and H, as 


Hy: (TT | n, 6) Pilz", 2’, x | n, 6, T) = P1(n, 8)Piln, 8), 
A: 0(T | n, 6)P2(x', 2’, x |», 6, 1’) = P2(n, 5)P2(n, 4), 


where P; and P, are relative probability densities of observations given 7’, and 
®, and @, are the densities of 7’, while 7 and 6 are the vectors (6; + 6) and 
(0, — 6). 

The regions R,r and Re on the surface of 7 for which the linear compound of 
overall errors aa;(@; , 62) + bae(@ , 62) is a minimum subject to the condition 


(6.1) ai(6, 0)/a2(0, 0) = 1/p, 
where p is fixed, are given by 
(6.2) Rir: aV;(n, 5)Pi(n, 6) + AiPi(n, 0) . bP.(n, 6)P2(n, 6) + AP 2(n, (0), 


with the reverse relationship in Rs. The constants A; and d, are determined to 
satisfy the condition (6.1). The proof of the result (6.2) and the subsequent 
ones follow from a lemma proved by the author in ({16], p. 340). The region (6.2) 
will generally depend upon the unknown quantities 7 and 6, and is therefore not 
useful. We therefore need to restrict the regions by imposing some condition on 
the error functions. 

We first note that the errors a;(6; , 62) and a»(@; , 62) could be written in terms 
of n and 6 as a(n, 6) and a(n, 6), using a, and a, as symbols for error functions. 
Let 

a(n, 8) = © a(n, 8), 

06 
denote the derivatives with respect to the parameters 6 in any given direction. 
The values a;(7, 0) and a(n, 0) are the errors when the populations are identical 
and the slopes of the error functions in the given direction at 6 = 0 are 


(6.3) ax(n, 0), a2(n, 0). 


To ensure optimum properties, at least in the neighbourhood of the line of 
equality of the two populations, we may minimise a linear compound of the 
slopes (6.3), or minimise them in a given ratio. Observing that minimising the 
slopes (6.3) is equivalent to minimising the slopes corresponding to the relative 
errors on the surfaces of 7’, we find the boundary separating the best regions 
R,r and Rs, on the surfaces of 7' as 


(6.4) a < Px(n, 0) T Ay P,(n, 0) _ b < P.(n, 0) + Ae P2(n, 0). 
00 Cc 


(i) For any a and b and the choice of \; and , to satisfy the condition (6.1), 
‘ , , . . . . ry s 
the linear compound aa;(7, 0) + baz(n, 0) is minimised. The special values a = 
b may be useful in practice 
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(ii) For a suitable choice of a, b, 4; , and Az, the slopes a;(n, 0) and a2(n, 0) 
‘an be minimised in a given ratio in addition to the condition (6.1) being satis- 
fled. The special case of the equality of the slopes may be of some practical in- 
terest. 

The solution (6.4) may depend on » when P}(n, 0) and P2(n, 0) contain 7. 
In the illustrations considered in Section 7, the P’;(n, 6) are functions of 6 only, 
so that the solution (6.4) serves the purpose. Otherwise some method has to be 
devised, such as minimising the average slopes over a set of n or considering 
regions similar for 7 with respect to the functions P;(7, 0). 

For the problem of testing the hypothesis H, against the alternative H, we 
have to construct a region w on the 7’ surfaces satisfying the four conditions 
(given y S 0) 


i / 
|(b) | P2(6 Q) dv 7; 


P.(6 0) dv = 0.05, 


(6.5) 


7) 


|(d) | P\(6 = 0) dv = a maximum. 
\ u 


P,(6) dv < 0.05, 


The region satisfying the conditions (a), (b) and (d) is given by 
(6.6) w: aP\(0) + \4Ps(0) = bP3(0) + AxP2(0) 


on the T' surfaces where a, b, \; and d» are suitably chosen. For this region the 
slope of the conditional power curve 8;(4) at 6 = 0 is a function of y defined 
in condition (b). We now relax this condition and maximise 8;(0) subject to the 
condition y S 0. With this choice of y we can set up the region w as in (6.6). 
If, for this region, condition (c) is satisfied, then we obtain a test of the hypothe- 
sis that H» is true against the alternative that AH, is true. This test is 
most powerful in a given direction for small differences in the parameters of the 
two populations. 

The situations in tests of significance and discriminatory problems are dia- 
grammatically represented in Figure 1. 

If the direction used in the above construction with the first derivatives is 
not justifiable, then we may try to impose further restrictions such as unbiased- 
ness of the error functions on the line of equality 


(6.7) ai(n,0) = 0, aa(n, 0) = O. 


We will assume that this condition implies that the derivatives of these errors 
vanish when 6 = 0 for all 7’. The derivatives are calculated from the conditional 
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probability densities 


‘ a “ 0 ’ 
(6.8) = ai(n, 6, T), 55 a2(n, 6, 7), 


where | a(n, 6, T)P¢(T | n, 6) dT = a(n, 6). Inall the illustrations considered in 


Section 7 this condition is automatically satisfied. Otherwise it may be necessary 
to impose the conditions (6.8) which may be only sufficient for (6.7). 

We consider the second derivatives of the relative probability densities with 
respect to the elements of the vector of parameters 6 = (6,, &,--- ). Defining 
fork = lor2 


—_ Fe oi . 9 on 
P,’ = db, 96, I rat, 0), P, De, P,(6 0), 


let us construct the regions 
(6.9) Ri: dS doaP i? + AnPt + AwP 5 4 oa + uP, 
z >) bisP2? + AnP? + hooPs -+- eee Sa poPs : 


with the reverse relationship in R,. 


> 
e 
= 
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Fic. 1. Power and error curves for tests of significance (left) and for problems of dis- 


crimination (right). 
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(i) The regions R; and R, minimise 


2 


DD 5755 aul, 8) + DD 3” 
J 


06; 06 


, 


a(n, 6) 


0; 00; 


at 6 = 0 for given a,; and b,; , provided \;; and yu; are chosen to satisfy the con- 
dition (6.8) and a given ratio of errors when 6 = 0. 

(ii) For a suitable choice of a;; and b;; , the local powers of discrimination for 
the two groups can be made constant on the ellipses 


(6.10) > >-y"6.6; = constant 


and their sum then maximised. Condition (6.10) implies that the second deriva- 
tives are in the ratio y’’. 


(iii) By a suitable choice of a;; and b,; we could also construct a critical re- 


gion w of a given size such that the first derivatives of a;(, 6) and ae(n, 6) van- 


ish at 6 = O and that [ dvr! wv is maximised subject to the condition 


>> BP?’ dv S 0, where y"’ and 8” are assigned as in (6.10). Such a region 
w 


can be used in testing the hypothesis H, against the alternative H, , provided 
the region is so adjusted that its size under H, is 5 per cent when 6 = 0 and 
less than or equal to 5 per cent when 6 ¥ 0. 

Another alternative is to restrict to those regions which give the errors as func- 
tions of a distance A between two populations. (Distance is a suitably defined 
function of the parameters of two populations. The construction of distance 
functions is discussed in two papers by the author [12], [14].) Even restricting to 
this class, it may not be possible to obtain regions for which a given linear com- 
pound of the errors aa;(A) + ba.(A) is minimised. In such a case, we may try to 
minimise 


‘ da, (0) +b daxs(0) 


(6.11 
p11) dA da 


to obtain regions for discrimination. For tests of significance we may have to 
maximise —da,(0)/dA subject to the conditions 


da(0) <0 


. — ax(0) = 0.05. 
da 


In Section 7, we shall show that such regions can be constructed. 

Another possible approach is to consider decision rules which satisfy the prin- 
ciple of invariance [7]. It appears that some of the results obtained here can also 
be deduced by using this principle. 

The necessity of considering the derivatives in (6.11) arises only when no uni- 
formly best regions exist in the class which gives the errors as functions of A only. 

Besides the parameters @ which are considered to vary from population to 
population, there may be other unknown parameters @ which are the same for 
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all populations. Thus we may consider the class of normal distributions with the 
same unknown variance but different mean values. In such situations, we may 
demand that the division of the space be similar for the unknown parameters @¢ 
also when the populations are identical in the 6 parameters. 

This introduces fresh complications in the applications of the results (6.4), 
(6.6), (6.9) and (6.11) for the derivation of optimum regions. Fortunately, in 
some cases the problems can be reduced in such a way that the above results are 
directly applicable, as shown in Section 7.2. 

One may argue that in laying down the decision rules, undue emphasis is laid 
on discriminating between populations which are close to one another. In the 
first place, this is done just to set up decision rules which do not involve the un- 
known parameters. In the absence of rules which are uniformly best, we can 
think only of rules which are best at some assigned values of the parameters, or 
at most for an assigned set of values. 


The requirement that the decision rule should possess some optimum proper- 


ties in the neighbourhood of equality of the populations is not unrealistic since 
in practice we often meet with alternatives which are closely related ; the methods 
developed are best suited to such situations. It is, however, possible to reduce 
decision rules which have optimum properties for a given difference in the parame- 
ters of the two populations. These may be useful in some situations. Of course, 
whatever may be the rule offered, it is better to examine its performance for all 
possible differences in the parameters of the two populations and satisfy oneself 
whether it can be reasonably applied in a given situation. 


7. Illustrations. 


7.1. Multivariate populations, dispersion matrix known. Let us consider p char- 
acters and represent the relevant statistics computed from the three samples, 
and functions based on them, as follows: 


Group. . 

Sample size 

Average of ith character 

Population average - Ba 


6; = wa — Me NE; + MFa + ME 

T;, = Za — Fn, EB, — (mEa + mF2)/(ny + ne) 
fi = (ny + ne)/nyn2, (n + my + m2)/n(ny + ne) 

1 Nno/(n; + Ne), Jo —n;/(ny + no) 

gq =1/fitgi/h, 2 = 1/fe + g3/fr 


When wa = veo = wu; for all 7, then Z; are sufficient statistics for u; and we need 
consider only the relative distribution of T; , U; given Z; . It is easy to see that 
T;, U;, and Z; are all independently distributed, so Z, can be dropped from 
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further consideration if errors are restricted to functions of 6 only. The joint 
probability density of 7; , U; under the hypothesis H, is 


P\(T, U, 8) 
Ou 5s (T i (I sa c ( a 7 

= const exp | -4 a 4 x 3 r; 93) + _ ( : 6; 1) |. 

1 J2 ) 


Under H, , we replace gy by g, to obtain P,(T', U, 4). 

In this problem we consider regions R, and R, whose size under both hypoth- 
eses depends only on the single parameter A = )°a'’6,4;, since the formulae of 
Section 6, using the first and second derivatives, do not yield fruitful results. 
With this end in view let us consider the surface integral 


4 


where 
P\(T, U, 6) 
= P,(T, U,0) exp {—3qa DD e[(5; — Wi); — Wy) — W.Wyl}, 


G= [ dé, --+ dé,, W; = (Ti/fi + mm Ui/fe) + H- 
4 


For the above integration, only the first expression in the exponential of (7.1.1) 
is important. This may be regarded as a p-variant normal distribution of 
6,, °°: , 6p. Then the integration results in a noncentral x° probability density 


with noncentral parameter M, , given by 


(_ pi2 #j )4)-1 ' , } "\1 
qi - , oma /2 A 2( ary (Pr? ‘T(=) 


a 4 


M=q D2Le'WW;, A’ =m, 


({16], pp. 51, 57). Observing that 
[ a, tes 
4 


and changing over to A in (7.1.2), we find the total integral in (7.1.1) for the two 
cases to be 


x , 1 
G,(A) = P,(T, U, Oe a4 me rap) (2°) 


0 ri (gp +r) 4 


* T(4p) ee) 


G.(d) = P.(T, U, 0)e*?*” >> 


0 riT (Gp + r) 4 
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Restricting the minimisation of a linear compound of the errors to the divisions 
which yield errors as functions of A only, the boundary is obtained as 


(7.1.3) aG,(A) = bG.(A). 


The proof of (7.1.3) is trivial ([16], p. 285). We have to make sure that for the 
regions based on (7.1.3) the errors are functions of A only. This follows from the 
invariance of the expressions M, and M, . The solution (7.1.3) in general involves 
A and can be used only when A is known. We can, however, seek for optimum 
properties in the neighbourhood of A = 0, where 


dG, (A) = P,(T,U, 0m ¢ M ae a 
da aD 2 


eal M, 1 
— PAT, U, Ow (5 :). 


Consider the boundary 


‘“ dG, (0) dG.(0) 


(7.1.4) + A P,(T, U,0) = b- + .P.(T, U, 0), 
dA dA 


or aqM, — bq2M, = c. The choice a = 6 leads to a minimum value of the sum 
of the derivatives of the errors. In this case the boundary is 


i — Hy (G1 — gr) mm a (gi — gs 
(7.1.5) > ® yu, + 2-8) py) — PGi o) 


Sife ) fe 


For the case g; = —g2, equation (7.1.5) reduces to>, >a’ Tl ’; = 0, so that the 
regions are 


Ri: Doda TU; 2 0, R:: Dde'TWU: s 0, 


with fifty per cent errors when A = 0. In this case the regions are uniformly best 
for all A because G,(4) 2 G,(A) in R, and the reverse is true in R, , irrespective 
of the value of A. The appropriate regions when n; # n. have the boundary as 
in (7.1.5). For these regions the errors may not be fifty per cent when A 0. 
If this condition is also insisted upon, the boundary is 


- a i ( —_ re 2(¢ — Gs) = 7 
(7.1.6) Dre {ti g U;+- = g T;U;? 2 ¢, 
fi Ah ) 
where c is suitably determined. We can also choose a, }, \; , and dy» in (7.1.4) 
subject to the condition that the derivatives of errors are equal when A = 0. 
For tests of significance the critical region is of the form 


(7.1.7) Ww: aM, —bM,2¢c 


where a, b, and ¢ are determined such that 


1 
(7.1.8) ax (Palas — bMs = 0)} a 
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is @ Maximum subject to 


P(aM, —_ bM, 2ci|\A =0) = 0.05, s {P:(aM, —_ bM, - C)} amo 0. 


In the above expressions, P,; stands for the probability according to the first 
hypothesis and P, for the second. The ultimate solution depends on the evalua- 
tion of the expressions (7.1.8). The problem needs further investigation. 

In the univariate problem, if (n; — ne) is not large compared to (m + m2), 
the regions for classification are obtained as special cases of (7.1.5) as 


Ry: TU 2 0, Re: TU s 0, 


T = 2, — #2, U = & — (md, + mo¥)/(ny + ne), 

where Z, and , are the averages of the two samples and Z that of the sample to 
be classified. The critical region for testing H: against H, is of the form TU 2 e, 
where c is determined to ensure five per cent size when 6 = 0. The regions for 
classification depending on different combinations of 7’ and U arediagrammatically 
represented in Figure 2. 

7.2. Multivariate populations, dispersion matrix unknown. In addition to the 
statistics defined in Section 7.1 we need estimates of the dispersion elements 
when the populations are identical, that is, when 6; = 0 for all 7. Let S;; denote 
the pooled sum of products within the three samples with (nm; + m2. +n — 3) 


T 


SECOND 


SECOND 


Fic, 2. Division of the space for different decisions 
I 
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degrees of freedom. When 4; is zero, all the observations can be regarded as 
samples from the same population so that we have estimates of the dispersion 
elements based on (n; + no + n — 1) degrees of freedom. If B,; denotes the cor- 
rected sum of products from the combined samples, then 


B;; = Sij - TT; fi + UU; te. 


The statistics Z,; (defined in Sec. 7.1) and B,; are sufficient for the common 
mean values and the elements a,; of the dispersion matrix. Similar divisions of 
the space are obtained by considering exclusive regions on the surfaces of con- 
stant values of Z; and B;; , subject to some conditions. The probability density 
of T,;, U;, and S,; under the hypothesis H, is 


const |S,;\""" exp {—4 da [Si; + (T; — 6)(T; — 6;)/fi 
+ (Us — gibi) (Us — 9165)/fr)} 


where m = (nm; + me + n — p — 4). Changing over to 7; , U; , and B,; permits 
their joint density to be written as the product of 


P(B;;|6 = 0) = const |B;; —, exp {—3)-) -a"B it, 


F(B, Zz. U) = B;, _ TT ; fi ~ UU; fe m/2 B.A" 2 


u) 


Q,(8) = exp [i(T/f. + mUi/foti — Wi, 


where ¢; = a6; + «++ + a6, and yj = mA = Wad Mik ik j . The probability 
density under the second hypothesis is obtained by replacing g, and q; by go and q 
in the above expressions. We shall consider divisions R, , Ry for which the errors 
are a function of the Mahalanobis distance A = >> )°a,,¢,¢; only. This means 
(7.2.1) / P(B,;|6 = 0) dB / ei? F(B, T, U)Q,(6) dT dU = B,(A). 

Ris 


Following the arguments of Hsu [6] and Simaika [17] in a similar situation, we 
can show that condition (7.2.1) implies 


[ Bre, 7, YQ) ar aU = GLK), 
Ris 


where K = D> Bit £5 . If we are minimising a linear compound of errors it is 
enough to maximise ae*!7G,(K) + be ¥2/*(,(K) on the surfaces of B,,, since 
the expected value of this linear compound integrated over B,;; with density 
P(B;; | 6 = 0) gives the linear compound of correct classifications to be maxi- 
mised. There is no hope of obtaining a solution without involving A, except 
perhaps when m = n:. We shall therefore minimise a linear compound of the 
derivatives with respect to 4 at the value zero, or maximise 

(7.2.2.) a ©: {e¥B,(a)} +b ‘ te¥2!"B,(A) | 


dA 
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+ A = 0. Evaluation of the three terms at A = 0 yields 


eViB.(a) = [ P@ye =0)dB | F(B,T, U)Q,(8) dT aU, 


Rips 
vi / P(B,;|6 = 0)G,(K) dB; 


oo 


ai; = | ByPBs| 5 = 0) 40) 


qK db, 


dG (0) , 


G,(0 
=m aT | BuP(By|6 =0)aB = (m + p + 3)ay dK ’ 


d 


xe ¥i29.(4) = — : 8:(0) + 8;(0). 


Consequently the value of (7.2.2) at A = 0 is 


{ dG,(0) dG.(0)\ aq: bqe 
l ( Gi ( 1(0)>. 
7 dK ) dK i m+pts3) — 2 O) + . ( 0); 


Since the latter depends only on the errors when A = 0, we need only maximise 
the former or the expression adG,(0)/dK + bdG.(0)/dK if possible, subject to 
given magnitudes of errors when A = 0. Observing that 


Gi(K) = | F(B, T, U) exp (DO (Ti/fi # gi Us/fedtih AT AU, 
R 


1B 


let us consider the surface integral over the surface S = 20>°B,,¢.¢; 


[ ra, 1’, U) exp +% (T/fi + ni Ui/fodtiiG dj, +--+ dfy, 
8 


(7.2.3) 
G = | dt, oe | 
8 


As in (7.1.1), the value of (7.2.3) is 


M,K\ 
79, ( ,. 5 Pp) : ) 
(7.2.4) const FU ig U ) nr Gp 5 4 ; 


where 


M, = >> BY (7; fit aU i/fe)(T5/fi + qu; fo). 


The derivative of (7.2.4) with respect to K at K = 0 is const M,, and the de- 
rivative corresponding to the second hypothesis is const M:, where the two 
constants are the same. We can now define the boundary aM, — bM, = ¢ over 
the surfaces of B,;. The constants a, b, and c may be suitably chosen. For 
discrimination we might choose a = b and c = 0, in which case the sum of the 
derivatives of the errors is minimised. We can choose a, b, and c differently as in 
other cases considered in Section 7.1. 
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For tests of significance we need to determine a, b, and ¢ such that over the 
surfaces of B;, 


[ F(B, T, U)Q.(0) dT dU = 0.05 S| F(B, T, U)Q,(8) aT dU 
u Cc w 


is a maximum at K = 0 subject to 


d | , hess - 
y ; 7 2 dv Ss ( (= bp 
aK J. F(B, T, U)Q.(6) dT dv = 0, | 0 


Here w is the region on the surfaces of B;; where aM, — bM, 2 c. It is easy to 
see that the distribution of the statistic aM, — bM, under any hypothesis is de- 
pendent on A only, thus ensuring the validity of the arguments used in the der- 
ivation of the regions. 

The distribution problems connected with the test criteria developed here 
have yet to be tackled. Some results obtanied by Wald [22], Harter [5] and Sit- 
greaves [18] in the reduction of distribution of the discriminant function and dif- 
ference of two quadratic forms will be extremely useful in the study of these 
problems. 
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A CONFIDENCE INTERVAL FOR VARIANCE COMPONENTS 
By J. R. Green 
University of Liverpool 
1. Introduction. 


Summary. In this paper an approximate confidence interval is found for the 
expected value of the difference between two quantities which are independently 
distributed proportionally to x’ variates. Three methods are used. The first is 
based on the work of Welch [13], [14] and Aspin [1], [2] on the generalized ‘“‘Stu- 
dent’s” problem, and involves neglecting successively higher powers of the re- 
ciprocal of one of the degrees of freedom. This method is used to check the other 
two solutions, both of which involve neglecting successive increasing and de- 
creasing powers, respectively, of a nuisance parameter. Finally a solution is 
formed using those resulting from the second and third methods, and is more 
accurate than those solutions. The order of accuracy, and the use of the final 
solution, are discussed. 

The paper does not present a method of computing confidence intervals in a 
form suitable for immediate practical application. Series developments of a cer- 
tain hypothetical function are given; more remains to be said about the relation 
between the series and the function, and the problem of computing tables. A 
computational exploration of the solution is at present in hand. 

Applications. In what has sometimes been termed a Model II multiple classi- 
fication, each observation is the sum of a constant and of contributions due to 
the different factors which feature in the classification, the interaction effects, 
and an error term. These contributions are taken to be normally and independ- 
ently distributed with zero means, and variances independent of the particular 
levels of the appropriate factors. These variances are called variance components, 
since each gives that portion of the total variance of each observation appropriate 
to a particular source. In a balanced layout, each of these variance components, 
except that due to the error term, can be written as a known constant multiplied 
by the difference of the expected values of two mean squares, which are inde- 
pendently distributed proportionally to x’ variates. Thus the results of this 
paper may be applied to these variance components. 

In the other main model of multiple classifications, the so-called Model I, the 
factors make constant contributions to the observations at the different levels. 
Here all the mean squares except the residual are proportional to noncentral x° 
variates, so that the results of this paper cannot be applied. However, in a 
‘“Mixed Model’’, where some factors are as in Model I and some as in Model IT, 
some mean squares will be suitable. 

The general balanced Model II classification will be exemplified by considering 
the two-way layout. Let y;; be the kth observation in the ith row and the jth 
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column, and take yij., = uw + ai + By + vij + ec, Where yu is a constant, and 

. ° . . 2 2 2 
a;, 8;, viz, and e€; are independent normal deviates with variances oo , 78 , 77 
and a, , respectively. The appropriate table is thus 


Source D. F. Mean square | & (mean square) 


Between rows - a—! Ma o2 + now + nbo®, 
Between cols. “i we b—1 Mg o + no, + naoz 
Interaction (a — 1)(b — 1) Vy . + no, 
Error ab(n — 1) Vv. o. 


. ° ‘ 2 ~ 
Consider the variance component oa , for example. Now 
2 1 
Ja = (nb) &(M, — M,,), 


and M, and M, are independently distributed as 


(a — 1) ‘(o. + Noy +- nba.)x’, (b — 1) "(0% + noy)x’, 


so that the results of this paper may be applied to obtain confidence limits for o@ . 

It is well-known that a confidence interval or confidence limit can be used to 
provide a test of a hypothesis which postulates a particular value for the param- 
eter concerned; for if the hypothesis be accepted when the hypothetical value of 
the parameter lies inside the interval or on the appropriate side of a single con- 
fidence limit, then the probability of rejecting a true hypothesis is fixed at some 
chosen level. This is true for the limits found for K, the variance component for 
which an interval estimate is obtained. An examination of the power of this test 
may require the tabulation of the function derived in Section 7. However it 
seems reasonable to expect that, for a sufficiently large difference between the 
true and hypothetical values of K, the power of the test will be a monotonically 
increasing function of this difference, since the interval continues to cover the 
true value in the fixed proportion of cases. 

Crump [5] states three main fields of application of work on variance compo- 
nents. 

(i) The interpretation of significance tests. Here variance component estimates 
are used to locate the sources of undesirable variation, so that this variation can 
be partially or completely eliminated. Tippett [12] discusses significance tests in 
the analysis of variance in terms of these components, giving a numerical exam- 
ple of the quality control of spectacle glass. Daniels [8] gives an example from 
the woollen industry. 

(ii) The selection of efficient sampling designs. This is the most important use 
of variance components. Usually interest is focussed on a particular function of 
the observations, such as the grand mean. The reciprocal of the variance of this 
statistic is then regarded as a measure of the efficiency of the sampling design. 
Both the cost and the efficiency are functions of the sample sizes and variance 
components. The usual procedure for choosing a good design is to estimate the 
variance components from a preliminary experiment, and then, using these es- 
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timates instead of the true values, to calculate the sample sizes which either 
minimise the cost for fixed efficiency or maximise the efficiency for fixed cost. 
Alternatively, Yates [15] suggests the general principle that an experiment 
should be so designed that the sum of the cost and the expected losses due to 
errors in the results should be minimised. Examples are given by Marcuse [10], 
and Nordskog and Crump [6]. 

(iii) Various problems in genetics. An example is given by Robinson and Com- 
stock [4]. 

In all of these fields, point estimates of variance components are now used. 
They seem to be more appropriate than interval estimates for many of the ex- 
amples met in practice. However, a confidence interval is useful for assessing 
the accuracy of an estimate. If the confidence interval is wide, then little trust 
can be placed in a point estimate; if it is narrow, then the estimate can reason- 
ably be regarded as trustworthy. Estimates do exist for the variances of the 
rariance component estimates, but these, being estimates, are less reliable than 
confidence intervals for assessing the accuracy of the variance component esti- 
mates. Also, they are less informative, since the usual type of variance com- 
ponent estimate has a complicated distribution, involving a nuisance parameter 
(see K. Pearson [11}). 

When variance components are used qualitatively to assess the amount of 
variation present, a confidence interval may be a more reliable guide to the 
judgement. 

Previous work. A full discussion of previous work would require too much 


space and anything brief is scarcely illuminating. Attention is directed to papers 
by Fisher [9], Bross [3], and a comprehensive survey by Crump [7]. In this paper 
we do not follow Fisher’s method of computing fiducial limits. 


2. The problem. The previous problems may be subsumed under the following 
canonical form. Two statistics M, and M, are given, which are independently 
distributed as oix°/r; and o3x°/T2 , With r; and r, degrees of freedom, respectively. 
Confidence limits are required for aA-«e , both a, and o3 being unknown. For 
the present, it will be assumed that a> + but this restriction will be with- 
drawn later, as discussed in Section 9. 

We define 

x o> — | 
at inet ~~ ons 
Thus a function f is sought such that 


(2.1) Pr [y S f(x)] = a, O<a<!l 


where a is given and f must be independent of the nuisance parameter. Later we 
shall require to find K such that M,/K = f(M2/K). The problem was put into 
this form originally so that the method of approach due to Welch (later referred 
to as Method I) might be exploited. 

Now rar/p and ry/(1 + p) are independently distributed as x° on r; and rz 
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degrees of freedom, respectively. With y; = ryy/(1 + p), requirement (2.1) be- 
comes 


« 


g ti 2)" : 'f rif (2) /U+e) a ] e yi /2 | re dx 
2.2 ——____ ( —— l = a. 
(2.2) i rdn) é ul, 9 orGn)‘ vif 2p a 


- 


Since x and y are nonnegative, the discussion is confined entirely to the first 
quadrant of the plane of x and y. Thus it is essential that f(z) = 0 (see Section 
9). Put 

T(z) = | {Gye 2rGr)} dy, g(x) = T,{ri f(z) / (A + p)}. 
Further, let ¢ be such that /,,(£) = a. Thus an f is required such that 


rexr/2 


Z p ro/2—1 
(2.3) . (2) "SE De aig me 8: 
-o I'(4re) \2p 2p 

independently of p. We do not know whether there exists a function f(x) which 
satisfies the above conditions nor, if it exists, whether it is regular. In this paper 
we derive a function f;y(x) which is such that (2.2) is approximately satisfied 
when f;y(x) is inserted in place of f(x). How good this approximation is can be 
determined only by computational means. 


3. Method I. In equation (2.3) we expand g(z) in a Taylor series about x = p. 
That is, we confine the investigation to the finding of a solution for which this is 
permissible, if one exists. Now g(x) = e* °°g(w), where 0° = [0"/dw"].., . Thus 


(2.3) becomes 


~ roz/2p ro/2—1 
e Tex (2—p)a T2 dx ( 
7 — e - g\w) = a. 
o I (472) 2p 2p 


With 6 = (1 — 20/r.)"**e ”, this becomes Ol,,{nf(w) / (1 + p)} = a. Ex- 
pansion of J,,{rif(w) / (1 + p)} about {rf(w) / (1 + p)} = & yields 


ry f(w)) fri f(w) ) 3 B 
r : = >) — a J r ( : = om : 
; ‘\1 + of exo | {i +p *f , |r mes . Az" Jemt 


Hence the equation to be solved becomes 
(3.1) 8 exp ({[nif(w) / (1 + p)] — €}D)I,,(z) = a. 


Equation (2.2) is very similar to the one solved approximately by Welch 
[13], [14] and Aspin [1], [2] in their work on the problem of comparing two means; 
the method used here is the same as theirs. The different functional form of the 
inner integrand’s upper limit, and the different type of the inner integral, prevent 
deriving our solution from Welch’s, although a comparison means of checking 
for r; = 1 can be used. It is evident that 0 is essentially the same in both cases. 

Continuing, we put f = fo + fi + fe + --- , where f, is of order —s inr:, 
and the expansion may be finite or infinite. The quantity fo is, as in Welch’s 
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work, the large sample approximation, here (1 + 2x)/r,. Expanding, 
68 = exp }— pd — tre log (1 — 2pd/re)} 

= exp {00° /re + 4p'a° inn + 

= 1+ p'0'/re + {4p'0°/3r2 + p‘a* 


Neglecting terms of order rz’ in (3.1), we have 


1+w rifi(w)D EES see | 
D - 1 Zz iu 
ex: it: [r+ +p "Ll its "Siew. 


- I,,(z) = I,,(€). 


. . . . nf —2 . 
Substituting for 6, and grouping separately terms of order r:° and r2°, we obtain 


[2 D+ om par 02 © exp {iD ( ts w 1)} ri fi(w)D 


tale a" G+ - 
+) + +55} exp {&D =; 
+ 

+ 


l1+p 2( l+p 


2 
\ 3re 


ri filp) po f (; w 
oe | : D + = exp \&? — 


Equating to zero the first order term yields 
(rifi(p) / (1 + pl, (€) + [oe / re(1 + p) U7, (8) 


Therefore f;(p) = —pet 7, (é) riro(l + p) h(E £). We put R, 
so that 


fi(xz) = —ax?R, ‘ryre(l + 2x). 


Equating to zero the second order term yields 


(2) = — —2t {r'[3R(eR: — 2¢°R,; — 42°R.) + 82°R, + 32¢°R 
fo(z) 6r,r2(1 + z)° {x al Ie Eh; 4f Ry») El; E 4] 


+ 8ali*R; — 3¢R2] — 12€R2}. 
It is required to express FR, in terms of and r; . Now 
nt 
1) = | Gy)? te“?/2r4r)] dy, 
1,,(&) = (4g)? “fe? /2r(4r1)). 


Using Leibniz’s formula, 


1 (¢) = gett ( r, — 1)! (§)" et? 
r) (4r; - 8)! = 2T (4r:) 
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mE ' (3ri oe 1)! _ gaeti (f)" mtg e § . 
lL JQmi—st+)!" 2 a 
4 ON a 
2 21 (4) 


yt 5: (=8) (jr: — 1)! oe 
ray AB) Gisetai\ i): 


i (4n—- 1)! ro 
§) eed a : 
(47, —8 +7)! 1 


oe s-—l 
= (28) 4 ir. — lin — A] + In = 26 — 1) -( ) 


‘eri —:2] +++ fs — He — 2D + +-- + (+H). 
Substituting for F, in the f’s, we obtain 


’ € 
1 2ri Pe 


f(x) = (1+2)° f(z) = —— —1 +2) 
; 
7 xt Saf Ase _ 9 oie 
AE) A emerge tnmmny, 1ST 11¢(r; — 2) + (ri: — 2)(7r1 — 10)] 
24r; ro(1 + x) 


+ 16a[t — 2é(r, — 2) + (rm — 1)(r, — 2)] + 24(rm. — 2 — 8}. 


For further terms operate on 


\ 


{ fo(w) fees fr(w) 
ij arene ri? 
\ 1+ p ) 
r+l 


by —@ and arrange the result as a power series in 1/re , say > _4,41(p) ro. Then 


ie 


rifess(p)lr(E)/(1 + p) = ar4s(p)/r3™ 

whence f,4:(p). Now this expansion is in descending powers of r., but though 
this may be large compared with r; it may not be large compared with certain 
powers of r; which may occur in the numerators of the f’s, or compared with &. 
This matter is considered in Section 8. 

Only the terms shown above have been worked out by this method, as the 
calculations become laborious and this solution is used only as a check on the 
solutions obtained by other methods. 

Another point regarding this solution is that in replacing p by x it has been 
assumed that a solution f(x) exists which is independent of p, whereas the func- 
tion f(w) which is operated upon by 0 may be actually of the form f(w, p). How- 
ever, if such a solution exists, then this method will give it. Moreover, f = fy + 
fi + fe does satisfy (2.2) to the order r>’, whether or not an exact solution exists. 
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A check has been made in the case r; = 1, when it is possible to deduce the ap- 
propriate series from Welch’s solution of the two means problem. 


4. Graphical representation. At this stage, a picture helps one to visualize 
Methods II and III, described in the subsequent sections. For simplicity, put 
v = dro, u= dry, a = }r, — 1, b = 4r, — 1. 


The joint probability density function of u and v is 


- (+) (2) pntilioniiie \-[ eh +° 
albi\l + p/ \p/ pl+p) I+p pd’ 


where a! = I'(a + 1) whether a is an integer or not. It is required to find g(u, v) 
such that the integral of this density function over the region g(u, v) S 0 shall 
equal a, that is, such that 


(1/a!b!) If uv’ du dy = a, D = {u, viglu(l + p), vp] S O}. 
D 


This equation shows that g is required such that when the curve g(u, v) = 0 
is scaled down by dividing the u-value of every point by (1 + p), and the 
v-value by p, then the integral of (1/a!b!) u°v’e*” over the region on one side 
of the resulting curve is a, independently of p. Also, when g(u, v) = 0, only one 
value of v corresponds to one of wu. 

When p — 0, the slope becomes very small and the curve flattens out to the 
form v = constant. If the integral under the curve is a, then the constant value 
of v will be $£, where € is such that /,,(&) = a. 

When p — ~, the curve cannot lie completely in the range u S U for any 
finite U. If it did, the scaled curve would lie completely in the range u s U 
(1 + p), which becomes arbitrarily small as p —> «. Thus the integral on one side 
of the curve can be made arbitrarily small, or close to 1, as the case may be. 
Similarly the curve cannot lie wholly in the region v S V, for any finite V. Thus 
the curve extends to infinity in both variables. 

Further, looking for a solution whose slope tends to a definite limit (not 
necessarily finite) for large u and »v (if such a solution exists), then for p large 
the shape of the scaled curve will be roughtly that of a straight line through the 
origin, with slope equal to the slope at infinity. Now the integral on one side of 
this line, say below it, must be a, that is, Pr (y/x S m/r,) = a. Since y/z is 
distributed as F,,,,, , (i.e. the F variate with degrees of freedom r; , rs), 


m = nF, ,7,(a), where Pr [(F,,, -. S F,,,,(a)] = a. 


From this crude picture of the approximations for p very large and very small, 
the first stages of the approximations derived in Methods II and IIT can be ob- 
tained. A greater understanding of these methods is also provided. 

The same conclusions are reached by the following intuitive reasoning: 

When p — 0, then p = o2/K and K = oj — o3. Let 03 — 0, then K = 07 , 80 
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that M, tends to become an unbiased estimate of K, distributed as rj'Kx’ on r; 
degrees of freedom. Hence r,y is distributed as x’ on r; degrees of freedom. An 
approximate f(z) such that Pr [riy S rif(x)] = a@ is f(x) = &/r,, which is thus 
the limiting solution as p — 0. 

When p — ~~, let K — O and a; — a1, 80 that M,/M, tends to become an 
F,, 7, variate. Thus in the limit y/z is an F,,,,, variate, and the appropriate f(z) 
such that Pr. [y S f(x)| = ais mx/r, which is thus the limiting solution as p > @. 


5. Method II. Asp 0, M, tends to become an unbiased estimate of K, so 
that for p small, f(z) = &/r; would approximately satisfy (2.2), as just pointed 
out in the preceding section. This solution neglects terms of order p’, so that the 
accuracy could be improved by neglecting higher orders of p instead. 

We take b = 4r, — 1 as before, and change (2.2), by the transformations of 
Section 4, to the form 


eo b \ 
; (Qou/ 
5.1) — nt “ re) \ ny = 
itp | 
It can be seen easily that it is appropriate to seek an f(x) of the form 
rif(x) = bo + bir + box” + byt? + ---, 


We now expand the function I,, in (5.1) about the point where its argument 
equals &. When p = 0, we find that bp) = & The expansion gives 


( 


I ‘= + bi(2pu/r2) T be(2pu /r2)° + = 


=at? LO +E +--, 


Tt, 
where 

= {[bo + bi(2pu/rs) + be(2pu/r2)® + ---]/ [1 + 

= [p(2byu/re — £) + be(2pu/re)’ + +--+] /[1 + 


Substitution in (5.1) gives 


| [e-“u’/bI[pl,,(é) + (p’/2) IT, (e) + +++] du 


“0 


Dividing through by J;,(¢)/(1 + p) gives 
peo —u db 
| = [{p(2b u/re — &) + ba(2pu/rs)” + +++} + {Rs/2(1 + p)} 
{ p(2b, u/r2 — £) + b2(2pu/re)* +--+}? + {R/31(1 + p)”} 
{p(2bi u/re — £) + -+-}° + +++] du = 0. 


Equating to zero the coefficient of p gives 2b,(6b + 1) = rmé, or by = &. Similarly 
equating to zero the coefficient of p gives 


j 


(2/r2)*bo(b + 2)(b + 1) = —4$R2{ (2b,/r2)"(b + 2)(b + 1) + F — (4b,/r2)(b + 1)E} 
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Therefore bo(b + 2)/(b + 1) = —4}Ro¥{(b + 2)/(b + 1) +1 — 2}, and conse- 
quently b. = —4}R.¢/(b + 2). 
Similar consideration of the coefficients of p’ and p* gives, in turn, 


_ +1) | { Rit | Ret’ Rs \ 
(6 + 2)( +3) \b+1 2 3(6 + 1)J’ 
(b + 1)? ((b+11)R.Rst* 5Rie 2R; 


~ b6+2)(6+3)6+4 | 464+1)? °° &42264+1)° 364+) 
_ Rit’ _ (6 + 31b + 60)Rie* _ (b + 3)Ri "| 
2 8(b + 1)*(b + 2) 8(b + 1)? | ° 


This term is as far as this solution was taken, since the work involved increases 
very rapidly. One would expect this solution to be unreliable for p 2 1, but it 
will not be used by itself. No simplification seems likely from replacing the R’s 
by their expressions in terms of & and r,, in this case. 


dg 


bs 


6. Method III. This is rather similar to the last method, involving the neglect 
of successive descending powers of p. Thus it is suitable for p > 1. In Method II 
a Taylor expansion about a constant was used; in this method the corresponding 
expansion is about a function of z. We look for a solution of (2.3) of the form 


rif(x) = mz + mo + ma + mae” + -:-- 


’ 


for which it is legitimate to write 
T,,{rif(2pu/re) / (1 + p)} = 1,,42mu/re} 


+ [rif(2pu/re) / (1 + p) — 2mu/ro\I,,{2mu/re} +.--+R,, 


for some n > 3, with R,, being of order p ". Now, for N 2 n 


’ 
‘ ‘ —1 
f(2pu/re) / (A + p) — 2mu/r2) = [mo — 2mu/re]p 
9 
+ [miyr2/2u — (mp — 2mu/r2)|p 
. ‘ ‘ 3 y 
+ [m2(r2/2u)” — {myre/2u — (mo — 2mu/re)'|p ° +--+ + Ty. 


Thus (2.3) becomes 


ae 


| (e-“u’/b!)({ [me — 2mu/re]p* + [mi r2/2u — (mo — 2mu/r2)]p + -- AW, 


“0 
(2mu/r2) + {(1/2!)[me — 2mu/rjp* + ---}? 7 (2mu/re) + ---) du = 0. 


Equating to zero the coefficient of p' gives 


| (e“u’/b!)(me — 2mu/r2)I,,(2mu/r2) du = 0, 


“0 


that is, (m/r2)*(1/a'b!) | e “Atma 2t°(m, — 2mu/r2) du = 0. 
0 
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Therefore, mp = 2m(a + b + 1) / (rx + m) = m(ry + re — 2) / (re + m). 
i ; . ir . 2 -~8 . ° 
Similarly, by considering the coefficients of p © and p ~, we find successively 


‘ 3 ny 

m = 4 m(ry + r2 — 2)(r2 +m) {re(m — 7, + 2) — 2m}, 
9) 5,52 ‘ 

me = & mr + ro — 2)(ro + m)  {2m'(r2 — 2)(r2 — 4) 


— ryem(3r2 + Trine — 32r. — 26r; + 76) + ra(ry — 2)(5r, + 3r. — 14)}. 


Again, this is as far as this solution was carried, due to the heavy work involved 
in proceeding further. 


7. Final approximate solution. A type A function will be defined to be of the 
form (x) = (1 + x) (apy) + ae” + aya” + ++ + ax + ay). Nowit is 
evident that this type of function can be put into the form of a type II or III 
solution, so as to agree with the first (r + 2) terms in either expansion. Further 
solutions of the form I, II, or III (save that a finite number of terms only are 
considered, so that (r + 2) of the calculated constants are involved) can be put 
into the form of type A. In this way Solution I was used to check Solutions II 
and ITT. 

For a final solution a type A function is formed using Solutions II and ITI. 
Since four constants of Solution III and five of Solution II have been calculated, 
then r + 2 = 4+ 5, so that r = 7. The coefficients ap , a; , a2 , a3 , and a, are 
calculated from bo , bi , be , bs , and by , and the coefficients ag , a7 , ag , and as from 
m, mo, m,, and m. We put 


go + ax + yx” +a a Sa ax, gx” + an’ + gx” 4 asx” 
respectively equal to the corresponding terms in 
7 2 3 4 7 l ‘ 2 
(1 + x)'(bo + bye + doa” + bgx” + dy’), (1 + x)’ (ma + mo + mx + mor “)~. 
Hence 
a = bo e aq = by 4 The . a = be + 7b, ot 21ho 9 
a3 = bs + Tbe + 21d; + 35d, ay = by + Tbs + 21be + 35d, + 35dp , 


7 7 7 _ or 
ds = mo + (1)m, + (2)mMy + (3)m = me + 7m + 21m + 35m, 


7 7 ‘ 
dg = m, + (4)mo + (2)m = m + Tmo + 21m, 
7 2 
az = m + (1)m = m + 7m, dg = mM. 
Using these values, we take 
7 4 7 
rif(xz) = (Ll + x) ‘(aga + age’ + +--+ + aye + a). 
From Solutions IT and III we obtain 
— y (2 a ( —h/ a 
m = r,F,,,-,(2), mo = mre + m) (ri + re 2), 


m, = 4m(re + m) ‘(ry + re — 2)[m(r2 — 2) — re(r: — 2)] 


= }mo(r2 + m)*[m(r2 — 2) — ra(ri — 2)], 
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Yom(r2 + m)*(r, + re — 2)[2m°(r. — 2)(r2 — 4) 
—mr(3r + Trire — 32re — 26r; + 76) + ri(r, — 2)(5r; + 3r2 — 14); 
é, by = é, be == —4R/(b + 2), 


bh ( 2 42 bi 2 . 3 
+1 [Re Re Re | 


~ &©+F264+3) lb+1° 2 #30641)’ 


(b+ 1)’ ((b+11)'RRst SRE 


~ 6+2)6+30+4\ 46+1"  &42264+1) 
2R;¢' Re (b+ 31b + 60)R2e° — (b + 3) Rye") 


3(b + 1) 2 8(b + 1)2(b + 2) “Bib + 1)? f° 


The solution thus derived will be called Solution IV. 
8. Accuracy of solution. It can be shown that for c large compared with a, 


| (y"e “/a!) dy = 1 — o(1). 

+0 
Thus r:f(a)/(1 + p) cannot be large compared with 1; ; if it were, the left side 
of (2.2) would be 1 -- o(1) and the equation would not be satisfied. So, f(x) 
must be O(1); similarly, must be O(r,). 

Now for Solution I to exist for large r; there must be at least a finite k such 
that f-(x) is O(r7"). Since £ isO(r;), fo, fr, and fe are of orders r4, rj, andr}, 
respectively, which suggests k = 4. That this k will suffice, or even that there 
exists a suitable k S 1, has not been proved and may be only conjectured. Fortu- 
nately, it is not necessary to make any such assumption about the value. 

For quick (or even any) convergence of Solution I it is necessary that O(ri) < 
O(r2) for a suitable choice of k. In practical cases r, is usually greater than r; 
and is often large compared with it for K positive. 

It can be shown that the f,(2) of Solution I is of type Ao, , where “type A,”’ 
will mean ‘‘of the form (1 + 2x) Youu ft a,x" + +++ + ax + a).” If Solution 
I were developed as far as f,(x), it would yield a type A; function, which could 
be compared with the A; function Solution IV. The former of these two type A; 
functions is correct to the order rz*. Thus when it is expanded in ascending or 
descending powers of z, the resulting coefficients are correct to the order rz*, and 
thus differ from the exact ones obtained from Methods II and III, respectively, 
by terms of the order ra. 

Consequently it is readily seen that the type A; form of Solution I and that of 
Solution IV differ only by terms of the order rz°, so that Solution IV is correct 
to the order rz‘. Using the consequences of the above discussion, we see that 
Solution IV is correct to the order (re /r) 7 

Further, if Solution IV is put in place of f(z) in (2.2), the error involved for p 
small will be of the order p’. Similarly the error involved in using Solution IV 
for p large is of the order p *. These statements apply whether or not there exists 
an f(x) exactly satisfying this equation. 





682 J. R. GREEN 


For convenience of calculation one could, of course, use fewer leading terms 
of Solutions II and III to form a less accurate Solution IV. The error involved in 
substituting this Solution IV into (2.2) may be rather less than the error in the 
approximate f(x), particularly if the upper limit of the inner integrand is suffi- 
ciently large or small, when the rate of change of the inner integral with respect 
to the upper limit will be negligible. 

However, when tables have been prepared using the solution given in this 
paper, there will be no need to use a less accurate approximation to save labour. 

The solution given in Section 7, say fry(x), has been calculated for r,; = 8, 
r, = 50, a = .975, and a series of x values. The left side of (2.2) was then 
calculated for p = 1. One would not expect this to give a particularly accurate 
value to a function correct to the orders p* for p small and p™“ for p large. Further 
r,/r, = 6.25, which is not very large. Thus one would expect the majority of 
practical cases to be more favourable than that chosen. By numerical integra- 
tion, the value of the left side of (2.2) was found correct to five significant figures 
as .97492, a satisfactory approximation to a. 


9. Obtaining and using the confidence limit. If f,, is of suitable form, a suit- 
able approximate confidence limit for K will be given by solving 


M,/K = fiv(M2/K), 


where f;y is the function given by Solution IV. In view of.the complicated form 
of fry(x), a numerical method of solution evidently will be necessary. Since 
x = M,/K and y = M,/K, the ratio y/z is M,/M; , which has an observed value. 
Thus the confidence limit, K, , is given by the intersection (xo , yo) of the curve 
y = fry(x) with the line y = (M,/M,)z, since 


Ke = M:/yo = M2/x%. 


This gives a lower limit such that Pr (K, S K) = a. 

Certain questions arise immediately: 

(i). How can one be sure that there will be only one point of intersection? 

(ii). How can one be sure that there will be any point of intersection? 

(iit). The previous results depend on the assumption that K > 0. What modi- 
fications are required for the case where the sign of K is not known? 

These matters will be considered in turn. 

(i). The complicated expression for fry makes uniqueness of intersection 
difficult to prove. A sufficient condition for having no more than one point of inter- 
section (since the asymptote to the curve and curve itself intersect the y-axis in 
positive values of y) is that the slope of the curve y = f;y(x) should be a mono- 
tonically increasing or decreasing function of x. The asymptote to the curve is 
given by the first two terms of Solution III, the second of which (that inde- 
pendent of x) is positive. This condition is satisfied in‘the particular example 
mentioned at the end of the previous section, in which the slope is monotonically 
increasing. Further, the condition is satisfied when r, is ‘sufficiently large com- 
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pared with 7; ; also, the slope of the curve is increasing when a is chosen to 
correspond to the lower limit, and decreasing in the upper limit case, provided 
the two appropriate choices of a are both reasonably different from 50 per cent. 
Hence, at least for r. sufficiently large compared with r, , if not more generally 
(as the author conjectures), there is no more than one confidence limit corre- 
sponding to one value of a. 

(ii). When the slope is monotonically increasing or decreasing, evidently 
there will be no intersection of the line with the curve, unless M,/M, is greater 
than or equal to the asymptotic slope of the curve, which equals F,,,,,(a). Now 
M,/M,z is distributed as 


(03/02) F 4.79 _ [(1 + p) ) Pry irs ' 


Thus the probability of nonintersection is the probability of an F,,,,, variate not 
exceeding pl’,,,,,(a)/(1 + p). This probability is a when p = ~, and decreases 
with decreasing p to zero at p = 0. Since M, and Mz are positive, it is evident 
that an intersection in the first quadrant leads to a positive K, . Further, it can 
easily be shown that, as M,/M, — F,,,,,(a) from above, K, — 0 from above. 
At this stage it is convenient to consider (717) along with (77). Now all the 
previous investigation of a solution providing a confidence limit for K could have 
been treated in exactly the same way with M, and M; and all related quantities 
interchanged. In this way an approximation to a function h, such that 


Pr{M2/(— K) S$ h(M,/(—K)]} = @ 


independently of a nuisance parameter 
p’ = — K/o; = —K/(K + 03) = —1/(1 +p") = —p/(1 + p) 

would have been obtained. An approximate h;y would have been derived equal 
to fry with interchanged r, and r,. A variation of p’ between 0 and © corresponds 
to one of p between 0 and —1, and is appropriate for K negative. The inter- 
changed ranges are appropriate for K positive. The accuracy of h;y with respect 
to p’ would be the same as that of f;y with respect to p as far as neglected orders 
are concerned. However, if r, is large compared with 7, , favouring the accuracy 
of fry , then the accuracy of h;y would not be so favoured; the reverse is true for 
r, large compared with r, . 

It will be seen to be satisfactory to use the curve y = f;y(x) in the first quadrant 
together with a second curve, —x = h;y(—y), in the third to provide a confidence 
limit. However, if the coefficient a is chosen for the first curve, then 1 — a will 
be taken for the second. The reasons for this are explained in the following para- 
graphs. 

Since F,,,,,(a) F,,,-,(1 — a) = 1, the asymptote to the first curve in the first 
quadrant will be parallel to the asymptote to the second curve in the third. Thus 
a straight line through the origin will intercept the first curve if its slope is less 
than F,, ,,(a), while if its slope is exactly equal to this value, it intersects both 
curves at infinity. 
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The set of points S, consisting of those points of the first quadrant lying below 
the first curve and those of the third quadrant lying above the second curve, will 
be used to obtain a confidence limit for K. If K > 0, the probability density 
function is zero outside the first quadrant and the probability of M, and M, 
being such that (x, y) lies beneath the first curve is a. If K < 0, the probability 
density function is zero outside the third quadrant, and the probability of (z, y) 
lying above the second curve is a. 

Thus, whether K is greater or less than zero, the probability is a that M, and 
M;, are such that (xz, 7) lies in S. Just as an intersection of the first curve with the 
line y = (M,/M,)z gives a positive value of K, , an intersection of the second 
curve with this line gives a negative value. Further, it can be shown easily that 
K, — 0 from below as M,/M, — F,,..,.(a) from below, or as M,/M, — F,,,,, 
(1 — a) from above. 

Thus the two curves together provide a lower confidence limit which falls 
below K with probability a. Evidently they provide equivalently an upper limit 
with coefficient 1 — a. Accordingly, two suitable values of a are selected, one for 
each limit, and an interval is obtained. The values .025 and .975, giving an 
interval coefficient of .95, are frequently used in practice. The complicated form 
of K, does not lend itself to an examination of which pair of values of a having a 
given difference (confidence coefficient) yield the shortest interval. 

Incidentally, the curves obtained by imaging radially the two curves through 
the origin into the opposite quadrants can be shown easily to form the curved 
parts of the boundary of an alternative set of points which yields a confidence 
limit. However, it is usual to have K > 0 and r, > r; , and one would prefer the 
positive confidence limit to be more accurate. To ensure this, the two curves 
should be used as discussed above. 

Under (777), it remains to be decided whether or not the confidence coefficient 
is affected by using only that part of the confidence interval which has the same 
sign as K, if this sign is known. Consider the lower limit with coefficient a; , 
when K is known to be positive 


Pr{max(0, K.,) S K| K > 0} = 1 — Pr{max(0, K.,) 2 K| K > 0} 
= 1— Pr{K,, = K|K > 0} 


=]1-(1- a) =a = Pr{Ka, s K}. 


A similar discussion applies when K is known to be negative, also for the upper 
limit when K has known sign. Hence the natural procedure does not distort the 
confidence coefficient. 


In the Introduction, we discussed the use of this confidence interval, or a 
single limit, for testing a hypothetical value of K. When K is known to be greater 
than or equal to zero and the hypothesis to be tested is K = 0, the use of the 
upper limit alone is more appropriate. In this case the hypothesis is rejected if 
M,/M, > F,,.-, (a), which is the usual test of the analysis of variance. 

However, throughout this paper there is one possibility which has not been 
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discussed—nor is it obvious how it could be, considering the technique that has 
been used. This is the case of K = 0, when the above derivation of a confidence 
interval would be completely invalid. That is not to say that the interval does 
not apply in this case—(the author conjectures that it does), but the point is 
just not proved either way, although it is true that Pr[y/z s f(x)/x] = a in the 
limit as K — 0. Thus the confidence interval carries with it the perhaps unneces- 
sary proviso that K is not zero. 


10. Tabulation. For the practical use of the two curves, discussed in the pre- 
ceding section, to obtain a confidence limit, the following procedure seems the 
most satisfactory. For each selected value of a, the values of f;y(x)/x are tabu- 
lated for different values of r; , r. , and x. It might then be advisable to retabulate, 
so that for each set of values of r; , re, and f,y(x)/x a value of z, or of f;y(x), is 
tabulated; otherwise use of the table would require inverse interpolation. To use 
the hypothetical table, 

If M,/Mz 2 F,,,,,(a), it is set equal to fry(x)/x and, by direct interpolation, 
the appropriate value of x = M,/K, is obtained and since M, is known, K, can 
then be derived; 

if M,/M, = F,,,,,(@), then K, = 0; 

if M,/Mz < F,,,r,(a), then rz, 7; , and (1 — a) are used as new values of 7; , 
r., and a, respectively, in the first procedure. 

The expression for f;y(x) is very complicated, and the tabulation discussed 


above, for a suitable selection of values of r; , r2 , and x, would require tens of 


thousands of cells. Accordingly the table has not been constructed for inclusion 
in this paper, and that task remains. 
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A SIMPLE SEQUENTIAL PROCEDURE FOR TESTING 
STATISTICAL HYPOTHESES' 


By Cuta Kvuer Tsao 
Wayne University 


Summary. In this paper a simple sequential test is suggested. Distribution of 
the sample size, its moment generating function, the power function of the test, 
and the ASN (average sample number) function are obtained. The determination 
of the set of relative optimum zones for making decisoins is shown to be unique. 
The existence of a class of sets of absolute optimum zones is proved. The sug- 
gested test is shown to be consistent. Some possible applications are discussed 
and a few numerical efficiencies are calculated. 


1. Introduction. Let {f(x)} be the class of all continuous pdf's (probability 
distribution functions) defined over a space S. Let random observations be 
drawn successively from a population having an unknown continuous pdf f(z). 
Let the simple hypothesis Ho:f{2) = fo(x) be tested against a certain alternative 
or a certain class of alternatives. We shall propose a simple sequentual test 
procedure and be concerned with the investigation of the properties of the test. 

To test the null hypothesis Ho: f(x) = fo(x), we divide S into three mutually 
exclusive sets (zones): 

S; is the zone of preference for acceptance; 

S, is the zone of indifference; 

S; is the zone of preference for rejection. 

Random observations are drawn successively. At each stage, the number of 
observations falling in each of the three zones will be counted. Let m; be the 
number of observations falling in the zone S; for i = 1, 2, 3 at the mth stage 
(i.e., after the mth observation has been drawn). Let a and r be two predeter- 
mined positive integers. Continue to draw observations as long as m, < a and 
m; <r. The experiment is discontinued as soon as either m, = a or m; = r. The 
null hypothesis is accepted if m, = a, and rejected if m, = r. 

For simplicity, we shall restrict S to be n-dimensional Euclidean space (or 
a subspace of it) and assume, of course, that the pdf f(x) is continuous in 8S. 
However, most of the theorems given in this paper can be extended to more 
general cases with slight modifications. 


2. Fundamental lemma. The principal aim of this section is to prove a lemma 
which was used for obtaining the moment generating function of the sample size 
and the power function of the test. 

Suppose m, p, and q are positive integers and B, C, and D are positive real 
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numbers. Let 


p—l 


— ! 
hn(p, q, B,C, D) = (m — 1)! 


=o (q — 1)!z!(m — q — 2) 


_ BYC*D”", 
(2.1) . 
h(p, q, B,C, D) = Yi h»(p, q, B,C, D). 


m=q 


Then we have the following 
Lemma 1. /f D < land D+ CD <1, then 


h(p, q, B, C, D) 


( BD ) (1 am fe orese fob @ 1)t ut. ee az) 
1— D—CD Jo (p — 1)"(q — 1)! - : oe 


(2.2) 


Proor. From definition (2.1) we have 


« p-—1 


( ras t 

h(p, 4, a D) = ,™ ie neiein ee hives : BC? DD” 
m=q r=) (q - 1) !a!(m — q _ x)! 

" S BYC*DY™ > (m — 1)! p"-2-* 


z=0 (q — 1)!a! mage (m — q — 2)! 


ee! B'C*pD** , 
> ! ! — (q +r¢r— 1! - D) ie 
{r! 


z=0 (g - |) 


FG +ax- 2 BD y( CD ) 


i—D 1—D 


‘ BD ‘ee (q+u-1 
(; —D- ab) X( x ) 


“(¢ =o) ( CD ) 
i- D | 1-—D 


20 (gq — 1)!2! 


( BD en ) 
1— D-—CD 


Jo (p — 1)'q — 1)! | ” 


Thus, Lemma | is proved. 


3. Distribution of the sample size and its moment generating function. ‘he 
distribution of the sample size, (a, 7, S; , Sz, Ss) being chosen, depends upon the 
true underlying distribution, f(x), being tested. In this section, we derive the 
pdf g;(m; a, r, Si, Ss) of the sample size m and its mgf (moment generating 
function) M,(t; a, r, S;, Ss) under the assumption that f(x) is the true under- 
lying distribution and the set of parameters (a, r, S; , S:, Ss) is predeteymined. 

Throughout this paper, we shall denote by A, J, and & the following three 
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quantities: 


(3.1) A= [ f(x) dz, J = / f(x) dz, R = [ f(x) dz. 


We shall denote these quantities by 

(a) A,,J7;, and R;, if f(x) is replaced by f,(x) for i = 0 or 1; 

(b) A’, I’, and R’, if (S, , Se , Ss) is replaced by (8, Se , 83); 

(c) A;,I;, and Rj, if f(z) is replaced by f(x) for i = 0 or 1, and (S; , S., Ss) 
by (Si, S2, S3). 

With the definitions (3.1), it is easily seen that the pdf is given by 


gym; a,r, S81, S83) = > ope - ons ae 
(3.2) a (m = 1)! es 
AG-Dele-e-si 
Therefore the mgf is given by 

M,(t; a,r,S:,S;) = >> __(m — I)! 


mar eo (r — L)itlim—r — 2)! 


rT 
(3.3) al 


Kee as SN 
ri De > m — 1)! A°R*I™**e"", 


maa z=o (2 — 1)!a!(m — a —2z)! 


This can be written as 


oe a—l as \1 r z 
We actiiwee OP eM... (*) (2) ey" 


ane ano (Fr — 1)izl(m — r — 2)! 


I} \ 
« r—1 1 
(m — 1)! *) (?} t 
(: (Te*)™ 
34) + ud (a — 1)!la!(m — a — (3 I qe) 
= h(a,r, R/I, A/I, Ie‘) + h(r, a, A/T, R/T, Ie‘). 
Thus, by Lemma 1, the mgf can be written as 


‘ r 
M(t; a,7,S1,8:) = (= Fy) 


Ae*/(1—Iet) ' 
or : a—l/ r—] 
(3.5) ° (1 os | ( 7 ) \r _ 1! 2Z (l— 2) dz) 


Ae’ eee pba I) os ” 
wa ce : a Las | 
+ (mapa) ( J G-iiee- pr “ ~* ds 


4. The power and ASN functions. Suppose the set of parameters 
(a, r, S; , Sz , Ss) is predetermined. Then, it is easily seen that for any alternative 
f(x), the power function is given by 


oe a—l 


(4.1) off; a,7,5:,5:) = 0 were 


amr 220 (r — 1) iz !(m — r — 2)! 


(m — 1)! 
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By Lemma 1, this can be written as 
¢(f; a,T, Si ’ S;) == h(a, rT, R/T, A/I, I), 
i) 
| — | A A A | ee i 
Jo (a — 1)r — 1)! ' 


a—l 


( — 2) dz, 


\ @—hir—)!* 


where 8 = B(f; S,, Ss) = 1/(1 + A/R). 
By the use of the mgf (3.5), it is easily verified that the average sample number 
(ASN) is given by 


u(f; a,r,Si,83) = al ol a,r, Si, S3) -(’ e= ') a - "| 
, 


+ , 1 —_ e(f; a,T, Si, Ss) = (" sate Ya" = ay’. 
f a 


This can also be shown to be 


(4.3) 


s 


u(f; a,r, Si, 83) = e +7)! 2 (1 — 2)’ az| 
(a — 1)!r! 


B ( ' 
r mt. sz e 
| +¢ 2 ‘(1 — 2) dz}. 
“0 (r = 1) !a! 
5. Optimum zones S,, S:, S;. In testing the null hypothesis f(x) against an 
alternative hypothesis f,(z), all the four quantities 


e(fo; a,7r, Si, Ss), g(fi; a,7r, Si, Ss), 
u(fo; a, 7, Si, Ss), u(fi; a,r, Si, Ss) 


are functions of the four parameters a, r, S;, and S;. Accordingly, there may 
be many ways of defining the optimum zones. However, in choosing a definition, 
we should take into consideration the following three problems: (a) the definition 
itself should be reasonable from the point of view of the statistician; (b) it must 
be realizable, that is, the optimum zones must exist; and (c) it can be put in a 
form suitable for applications. 

Furthermore, if the pair of positive integers (a, r) is preassigned, a set of opti- 
mum zones should be such that it is optimum (in some sense) among all possible 
sets (S; , Sy, Ss). If the pair (a, r) is to be determined by the experimenter, then 
a set should be so chosen that it has certain optimum properties in the whole 
parameter space {(a,r, S;, Se, Ss)}, that is, it is optimum for all possible choices 
of pairs (a, r) and all possible choices of sets (8S; , S:, S;). In the following, we 
give two definitions, one for a fixed pair (a, r) and the other for the general case. 
However, the determination of the optimum zones for the general case is so 
difficult that we shall just prove their existence. 
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For any given set (a, ¢, a, r), where0 < a < ¢ < 1, we shall denote by 2... 
the class of all possible sets of the three zones (S,, S:, S;) which satisfy the 
following two conditions: 


(5.1) o(fo; a,r, Si, S3) = a, e(fi; a,7, Si, Ss) = ¢. 


Here, we have assumed that the class 2,.,,4,- is nonempty. A proof of the existence 
of such a class under certain general conditions will be given in Section 6. 

A test is said to have the strength (a, ¢), if its power function satisfies the two 
conditions (5.1). Thus, every test based on a set of 2.4.4, has the strength 
(a, ¢). 

Dertnition I. A set (S;, S2, Ss) of Qa..a is said to be relatively optimum 


with respect to (a, r), if the inequalities 


»T, Si, Ss) S ul(fo; a,r, Si, Ss), 


’ 


u(fo; a 
(5.2) — 
(fi; a,r, Si, Ss) S w(fi; a,r, Si, Ss) 
hold for all sets (Sj , Bs, Sse Qa.e.a.r- The three zones of a relative optimum 
set are called relative optimum zones. 
To determine the relative optimum zones, we need first to prove the following 
two lemmas. 
Lemma 2. For fixed a and r, the ASN function u(f; a, r, S;, Ss) decreases as 
either A or R increases. 
Proor. Taking the partial derivatives of the ASN function (4.4) with respect 
to A and R, we obtain 


‘ Ou r ‘/ (a er)l ont \r | 
5. Pee eat eal deds si Saati «oa 
>) OR eI [ (a — 1 ir!” (1 , on 


8 ' 
(5.4) ou _ +3 1 — [ (r + a)! 2’ "(1 — 2)° de]. 


5A (r — a)!a! 


Since (5.3) and (5.4) are always negative, Lemma 2 is proved. 

Lemma 3. Suppose (a) fo(x) and fi(x) are continuous, (b) for every real number c, 
the probability measure of the set x; fi(a)/fo(x) = c} under either hypothesis is 
zero, and (c) the set (S; , Se, Ss) defined by 


{x; filx)/fo(x) S ko 
ko S filx)/folz) S hy 


ki S filz)/folz)}, 


Then, for any set (S;, S2, Ss) in Qa,y,a.r , we have 


where ko S ky are two constants, belongs to Qa,g,a,r + 


(5.8) Ae & Ac, Ro S Ro, Ais A, Ri sR. 


. . » *@ . . , 
Proor. In order to prove Lemma 3, it is sufficient to prove (i) if Ao S Ao, 
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then Rj S Ro, Ai S A; and R; s R,, and (ii) under the given assumptions, 
the inequality Ag < Ao holds. 

First, assume Ao S Ao. Since both (S,, S2, S;) and (S;, S2, S3) are in 
Qa.y.a.r, then, by (4.2) and (5.1), we have 


(5.9) (a) AoR> = Ao/Ro, (b) AiR: = AiR. 


By (5.9a), Ao S Ao implies Ros R. By (5.7), Ro s Ro implies Ri sR; (using 
an argument of the Neyman-Pearson type). Finally, by (5.9b), Ri < R, implies 
Ai S Aj, proving (i). 

Next, assume Ag > Ay. Then, by (5.9a), there exists a positive number 6 
such that 


(5.10) Ag = Ao + 5Ao, Ro = Ry + bRo. 
Therefore, by (5.5), (5.6), (5.7), we must have 


(5.11) Ai > Ai + 6A1, Ri < Ry, + 6R,, 


which imply that Ai/R; > A,/R; . Consequently, we obtain 


(5.12) v(fi; a,7, Si, Ss) <¢(fi; a,7r, Si, Ss). 


This contradicts the assumption that both (S;, S,, S;) and (S;, S2, S3) are 
members of Qa.,c,,- Hence, the inequality Aj < Ao must hold, proving (ii), 
which completes the proof of Lemma 3. 

THEOREM 1. Under the conditions given in Lemma 3, the set of the relative opti- 
mum zones (Si, So, Ss) with respect to (a, r) for testing the simple hypothesis 
fo(x) against the alternative hypothesis f\(x) with strength (a, g) is the set determined 
by (5.5), (5.6), and (5.7) 

Theorem 1 follows from Lemmas 2 and 3. 

From Theorem 1, it is seen that, for each (a, y, a, r), the set of relative opti- 
mum zones (8; , Sp, S;), when it exists, is uniquely determined. In the following, 
we shall assume that, for every (a, ¢, a, r), the set. of the relative optimum zones 
exists. We shall denote by ©.,, the class of all possible sets of the three zones 
such that the corresponding tests will all have strength (a, ¢) for testing fo(x) 
against f(a), thatis, Qa. = Us» Qa..a,.- Todistinguish the sets in 2,,, from the 
sets in Q.,y..,, for some fixed (a, r), we shall write (a, r, S; , S. , S;) as the general 
set in 2,,,. We shall also denote by 2... the class of all sets of the relative 
optimum zones in Q,,,, that is, all sets (a, r, S;, Se, Ss), where, for each pair 
(a, r), the set (S;, S., S;) is the set of relative optimum zones with respect to 
(a, r). 

A set (a, r, Si, Se, Ss) of Qa, is said to be comparable with another 
set (a’, r’, S; . S>, S3) of Qa¢ if either the two inequalities, 


bifo; a,r, Si, Ss) S wlfo; a 
(5.13) 


, , 
, 


(fi; a,r,S:, Ss) S wlfi; a’, r’, Si, Ss), 


’ 
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hold simultaneously, or the two inequalities, 
u(fo; a,r, Si, Ss) 2 wlfo; a’,r’, Si, Ss), 
(5.14) ; ; 
ulfi : a,T, S; ’ Ss) = BU ° a’, ev. Si . S3) 


’ 


hold simultaneously. Otherwise, they are said to be noncomparable. Two com- 
parable sets are said to be equivalent, if all four inequalities in (5.13) and (5.14) 
hold simultaneously. 

LemMMA 4. Given any set (a, r, Si, Se, Ss) in Qa, there is a@ set 
(a’,r’, S:, So, S3) inQ. ¢.o such that the two inequalities (5.14) hold simultaneously. 

The proof is trivial, since we can always choose a’ = a andr’ = r. 

Lemma 5. For any set (a, r, S;, Se, Ss) im Qa, the number of sets 


’ 


’ 


iy ai a ace et a a 
(a’, r’, Si, Se, Ss) in Qa.¢.0 satisfying the two inequalities (5.14) is finite. 
‘ f af ft ‘. 
Proor. For any set (a’, r’, Si, Se, Ss) of Qa.¢.0 (or of 2... in general), the 
following two inequalities, 


u(fo; a’,r’, 81,83) = r’ata'(l — a), 
(5.15) 7 


(fi; a’,r’, 81,83) 2 re +a'(l — ¢), 
must hold. But, for any set (a, r, S:, Se, S;) in Q.,, the two quanti- 


ties u(fo; a,r, Si, S3) and y(fi; a,r, S:, S;) are finite. Thus, Lemma 5 follows 
from the uniqueness of the set of relative optimum zones for each (a’ 


, 
9 )e 


From Lemmas 4 and 5, it is obvious that, for each set (a, r, S;, Se, Ss) in 


. * 
Q.,., there exists a comparable set (a*, r*, Si 
f 


* * . 
, Se , S3) in Q,.,.0 such that the 
ollowing two inequalities, 


° .* * ’ ' 
bl fi : a’, a Si » 93) < ulfo > Gf, Sy 5 Ss), 


’ 


(5.16) 
1* ’ Y ’ 
a(fi } é“, as Si ’ S?) Ss u(fi > 4,7, Si ’ Ss), 


hold for all sets (a, r, Si, Se, S;) in Qa, which are comparable with 


_* * * . * * 
(a*, r*, Si , Se , Ss). Denoting by Q2,,0 the class of all such sets (a*, r*, S; 


; ’ 


Sz , S3) in 2... , we may conclude: 

THEOREM 2. The class 22,9 is a subclass of Qa,g.0. Twodistinct sets in Q% ¢.o are 
either equivalent or noncomparable. 

The class 2" ¢.o may be called the class of sets of the absolute optimum zones. 
Since there may be many sets of the absolute optimum zones and the determi 
nation of any such set is difficult, we shall assume, throughout the remaining 
part of this paper, that a and r are preassigned and the three zones are chosen 
according to (5.5), (5.6), and (5.7). We shall also denote by g(f) and yu(f) the 
power and the ASN functions of the test if the three zones are so chosen. 

We have seen that, for each (a, ¢, a, r), the set of the relative optimum zones 
(S;, Se, Ss), when it exists, is uniquely determined. On the other hand, it is 
easily seen that, for each (a, a, r), there are an infinite number of sets of the 
relative optimum zones (S,; , S,, S;). We shall denote by ©,..,-. the class of all 
such sets of the relative optimum zones (S;, S:, S;), that is 


a ae) 
Qa.g.ar ADa»,0 5 a<¢g<l} 


y -_ ‘ 
s64,a,7T,0 ~ ' 
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The test based on a preassigned (a, a, r) and a set (S;, S2, S;) will be called the 
R. O. (relatively optimum) test with respect to (a, r) for fixed level of significance 
a, or simply the R. O. test, if (S,;, S:, S3) €Qeaaro. 


6. Consistency and existence of 2,,,.., . In Section 5, we assumed that, for 
any given set (a, yg, a, r), the class Q..4.4,, is nonempty. This assumption is valid 
only when fo(x) and f;(x) satisfy certain general conditions. On the other hand, 
the consistency of the R. O. test depends on the existence of such classes. 

Suppose a and r are presassigned positive integers. Suppose the hypothesis 
fo(x) is to be tested against the alternative hypothesis f,(x). Again, we shall 
assume that fo(2) and f,(x) are continuous, and, for every real number c, the 
probability measure of the set {xz; fi(x)/fo(x) = c} under either hypothesis 
is zero. Let 
(6.1) (a, ¢1), (a, ¢2), (a, gs), °° 0O<acg<l1, 1 =1,2,3,°:: 


’ 


be a sequence of pairs of real numbers. Suppose there exists a sequence of sets 
of the relative optimum zones 


(Sy ’ Sor , Sa), (Sy Se 


y 7229 * 


S32), (Sis , Sos, Sas), ° °° 
(6.2 


(Si: ’ Soi ’ S3i) E Qe.94.08 ’ i= l, 2, ct ore 
such that the corresponding sequence of R. O. tests will have (6.1) as the se- 
quence of strengths for testing fo(x) against f,(2). Let the corresponding sequences 


of ASN functions be 


(6.3) mi(fi), ua(fi), us(fi), "2 ’ t = 0, ™ 


We shall say that the sequence of R. O. tests is conditionally consistent, if for 
any alternative f;(x), the sequences of inequalities 


(6.4) milf) < wolf) < usl(fi) < ees, 


imply the sequence of inequalities 
(6.5) ¢1 < 2 < 3 < eee. 


This definition is equivalent to 

Derinition II. The R. O. test is said to be conditionally consistent, if, for any 
fixed level of significance a and any alternative f;(x), the power function ¢(f;) 
of the R. O. test increases whenever the ASN functions u(f;), for 7 = 0 or 1, 
increase. 

The following two lemmas apply to the R. O. tests. 

Lemma 6. For a fixed level of significance a and fixed alternative f,(x), the power 
function ¢(f;) is a monotone increasing function of Io . 

Proor. From (4.2), it is evident that in order to prove Lemma 6, it would 
be necessary and sufficient to prove that, under thegiven conditions, if (8S; , S. , Ss) 
and (S;, S:, S3) are two different sets of the relative optimum zones such that 
Ii > Ip, then Aj/R; < A;/R,. Now, since the level of significance a remains 
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fixed, then, by (4.2), the equality (5.9a) holds. Therefore, there exists 0 < p < 1 
such that 


(6.6) Ao = pAo, Ro = pho. 
By (5.5) and (5.7), these equalities imply that 

(6.7) Ai < pA, Ri > pk, . 
Consequently, we obtain 

(6.8) Ai/Ri < Ai/R,, 


which completes the proof of Lemma 6. 

Lemma 7. For a preassigned level of significance a, the ASN functions p(f;) 
for ~ = 0 or 1 are monotone increasing functions of Ip. 

Proor. Increasing /») decreases S; and S;, and therefore Ap , Ry, A; and Rk, . 
Thus, Lemma 7 follows using Lemma 2. 

THeoreM 3. The R. O. test is conditionally consistent. 

This theorem follows directly from Lemmas 6 and 7. 

Conditional consistency is a rather weak property. It does not assure us that 
as the average sample number approaches infinity, the power of the R. O. test 
approaches one. Hence, a stronger property is desirable. 

Dertnition II’. The R. O. test will be said to be absolutely consistent, if, for 
every fixed level of significance a and every given alternative f\(x), the power 
function ¢(f;) tends to 1 as the ASN function p(f;) tends to «. 

Although the R. O. test is conditionally consistent, it may not be absolutely 
consistent. We shall verify this assértion by an example. But first, let us state an 
obvious but useful lemma. 


Lemma 8. For a fixed level of significance a, if (S;, Se, Ss) is a set of relative 


optimum zones and if (S;, Sx, Ss) ts any other set of the three zones such that Ip = 
Io, then we have 


(6.9) fi; a,r, Si, Ss) 2 offi; a,r, Si, Ss). 


This lemma is obviously true by (4.2), (5.5), (5.6), (5.7) and (5.9a). 
The following example shows that the R. O. test is conditionally consistent, 
but not absolutely consistent. Let a class of pdf’s be given as follows: 


(6.10) f(x) = 6+ 21 — @)z, 0624 08 t, 9 « 
Let the hypothesis 

(6.11) Ho:6 = 1 

be tested against the alternative hypothesis 

(6.12) H,:6 = @,, 


Clearly, this is equivalent to testing the uniform density fo(z) = 1 against the 
alternative fi(x) = 6, + 2(1 — 6,)z. Since the ratio fi(x)/fo(x) = fi(x) is a mono 
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tone increasing function of z, then any set of the relative optimum zones will 
have the form S; = (0,2), Se = (x, 2’), and S; = (z’, 1). Furthermore, since, 
for fixed a, both S; and S; must satisfy the equality 


(6.13) Ro -_ Ao, 
where ) is determined so that g(fo) = a, then z and x’ must satisfy 
(6.14) xv’ =1— da. 


As a result, we obtain, by (4.2), 


B, = 1/(1 + A;/R:) = [A(2 — &) — VO — 4)z] 
(6.15) . 
(0, + A(2 — &) + (1 — O)(1 — X’)a}. 


Taking the limit on #; in (6.15), we obtain 


(6.16) lim i, = lim ii = (2 = 6;)/ (6; + A(2 — i)) = p* < 


Io-*1 z+) 


Hence, we have 


ne 


(6.17) max ¢( fi) < g*, ¢* z i = Iie — 1 2 


(a+r-— 1)! r1¢4 4 


— 2)" "dz <1. 


0<SIg<1 
Thus, by Lemma 8, for a given set (a, ¢, a, r) with g > ¢*, the class 2,...4, is 
empty, that is, for the given pair (a, r), there is no set of the three zones giving 
strength (a, yg) for testing fo(x) against f\(x). Therefore, the R. O. test can not be 
absolutely consistent, though it is always conditionally consistent. 

In the following, we give a necessary and sufficient condition for the existence 
of Q..9,0,r for an arbitrary set (a, ¢, a, r) and also a necessary and sufficient con- 
dition for the absolute consistency of the R. O. test. 

THEOREM 4. A necessary and sufficient condition for the existence of Qa... 18 
that there exists a number k > 0 such that (A) the probability measure of the set 
w, = fa; filx)/folx) S kBo/(1 — Bo)} is positive under either hypothesis and 
(B) the probability measure of the set ws = |x; filx)/fo(x) 2 kBi/(1 — B1)} ts posi- 
tive under either hypothesis, where B; = 1/(1 + A;/R,) for i = 0 or 1 are deter- 
mined from (4.2) so that the R. O. test would have strength (a, ¢) for testing fo(x) 
against f,(x). 

Proor. i) Sufficiency. Since the power functions are continuous under the 
assumptions, then, from Lemmas 6 and 8, it is clear that in order to prove the 
existence of Q..6.0,,, it would be sufficient to prove the existence of 2. 4°. 
where ¢’ = ¢, that is, it is sufficient to show that we can find a set of relative 
optimum zones (S;, St, 83) such that the following are satisfied: 


(6.18) (a) Ao/Ro = (1 — Bo) ‘Bo ‘ (b) Ai/Ri s (l — 8,)/Bi . 


Now, if conditions (A) and (B) hold, we can choose a subset S; C w; and a subset 
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S; C w; such that (6.18a) is satisfied. Consequently, we have 


(6.19) Ai S AokBo/(1 — Bo), Ri = RokB,/(1 — B;). 


Therefore, the inequality (6.18b) holds. 
ii) Necessity. Conversely, if the class 2..,..,- exists, then, by Lemmas 6 and 8, 
a set of relative optimum zones (S} , S? , S3) exists such that 


(6.20) Ao/Ro = (1 — Bo)/Bo,  Ai/Ri S (1 — 6s)/B: 
hold. Consequently, there exists a number k > 0 such that we have 
(6.21) Ai/Ao & kBo/(1 — Bo), Ri /Ro = kpi/(1 — Ai). 


Therefore, there exist subsets w, C S; and w, C Sj such that conditions (A) 
and (B) are true. 

THEOREM 5. A necessary and sufficient condition for the absolute consistency 
of the R. O. test is that at least one of the following two conditions is true: 

(A’) for every positive ¢, the probability measure of the set 


wi = fa; filx)/folx) S «} 


is positive under either hypothesis; 
(B’) for every positive ¢', the probability measure of the set 


w; = {a; filx)/folx) = €} 


is positive under either hypothesis. 

Proor. i) Sufficiency. By (4.4), it is seen that the ASN function y(f,) tends 
to infinity only if at least one of the two quantities A; and FR, tends to zero. 
Hence, by (4.2), it is obvious that in order to prove the sufficiency it would be 
sufficient to show that, for every given level of significance a and every alterna 
tive f(x), the ratio A,/R, tends to zero as R, tends to zero. Clearly, for a fixed a, 
the ratio Ao/Ry remains fixed. Let 


(6.22) 1o/Ro = d, 
where d is a constant. Then, by (5.7), Ri; — 0 implies Ry — 0. By (6.22), Ry + 0 


implies Ay — 0. Finally, by (5.5), Ag — 0 implies A, — 0. Furthermore, if (A’) is 
true, then 


(6.23) lim F,/Ry > 1, lim Aj,/. 


R,-0 R +0 


If (B’) is true, then 


(6.24) lim P,/Ro 
R ,-+0 


Consequently, in either case, we obtain 


(6.25) lim A,/R, lim d(Ay/ Ry) (Ro/ Ao) d lim (A,;/Ao)/(R,/ Ro) (0. 
R +0 


R,»-0 R,-0 
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ii) Necessity. The necessity can be easily proved by contradiction. Assume 
both (A’) and (B’) are not true. Then, the ratio f,(z)/fo(x) must be bounded. Let 


(6.26) L = g.lb. {filx)/fo(x)}, U = Lub. {filx)/fo(x)}. 


Choose a set (a, ¢, a, r) such that 


(6.27) Bo/(1 — Bo) < L, B,/(1 — Bi) > U, 


where, again, 8) and #,; are determined from (4.2) so that the R. O. test should 
have strength (a, g). Then, we can not find a k > 0 such that (A) and (B) in 
Theorem 4 hold simultaneously and hence the class 2... is empty. This con- 
tradicts the assumption that the R. O. test is absolutely consistent. Thus, the 
necessity of either (A’) or (B’) is established. 

From Theorem 5, it is seen that unless (A’) or (B’) is satisfied, the R. O. test 
can not be absolutely consistent. However, for practical purposes one may 
modify the procedure and thus obtain a R. O. test with a specified strength 
(a, gy). The following are two of the possible modifications: 

(a) Increasing a and/or r. The power function is in the form of the incomplete 
beta function. Thus, for an arbitrary pair (a, ¢), it may be possible, by increasing 
a and/or r, to decrease the difference between 8» and 6; so that, for some k > 0, 
the conditions (A) and (B) in Theorem 4 are satisfied. 

(b) Taking the observations in groups. When observations are taken in groups 
of size n, one may apply the R. O. test on some appropriate statistic so that the 
R. O. test will have the specified strength (a, ¢). This is because sometimes for 
some appropriate n, the pdf’s of the statistic under the null and the alternative 
hypotheses may satisfy the condition in Theorem 4. Usually, this is true when 
n is sufficiently large. 


7. Applications. The R. O. test procedure may have a wide variety of appli- 
cations. In testing a simple hypothesis, the procedure is applicable whenever the 
pdf under the null hypothesis and the ratio of the pdf’s under both the null 
and the alternative hypotheses are determinable, especially when the condition 
in Theorem 5 is also satisfied. For example, let n(x; 0, o) be the pdf of a normal 
distribution, that is, 


: ° l l .@ 
n(x; 6, a) exp ( - =. (x — *), -o <Z2< ®, 
Vv ono ao 


where o is known, and let the hypothesis Ho:0 = 6» be tested against the alterna- 
tive hypothesis H,:0 > 0). For 6 > 6, the ratio n(x; 8, ao )/n(x; 0.0) isa 
monotone increasing function of x, and both (A’) and (B’) in Theorem 5 are 
satisfied. Hence we can apply the R. O. test by taking the three intervals (— ~, 
x1), (%, %), and (rw. , ©) as S,;, Se, and S;, where x, and x» are determined 
so that, for fixed “a’’ and “r’’, the R. O. test will have a preassigned strength 
(a, ¢) for testing % against some alternative 6; where 6; > 6) . The determination 
of x, and 2x, can easily be made by trial and error, since x, is a monotone decreas- 
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ing function of g and x2 is a monotone increasing function of ¢, for a fixed level 
of significance a. For instance, if (a, ¢, a, r) = (.05, .95, 1, 1). then, x; and 2» 
should be so determined that the following two equalities are satisfied: 


(7.1) 1/(1 + Ao/Ro) = .05, 1/(1 + A,/R,) = 95 


, 


where Ay, A; , Ry and R, are given by 


z\ x 
(7.2) A;= | n(x; 0;,0°) dz, R; = / n(x;0;,0°) dx. 

Le Le 
Thus, if 6; = 0 + 20, then, by the use of the normal probability table, the ap- 
proximate values of x; and x. are found to be x; = 6) + .0930 and x. = 6 + 
1.9076. 

If a composite hypothesis is to be tested, sometimes one may also apply the 
procedure if it is possible to take the observations in groups and a similar region 
can be found. For instance, the central ¢-distribution is used in testing the loca- 
tion of the mean of a normal distribution with unknown variance and the x° 
distribution will be used in testing the variance of a normal distribution with 
unknown mean. 

The following two examples illustrate the application of the test to the non- 
parametric and multisample problems. 

Example 1. (Test of the location of the median of a population.) To test 
whether the median » of a population is equal to or greater than vp, one can 
take the observations in groups of size n and call an observation 0 if it is less 
than vp and 1 otherwise. Under the null hypothesis, the sum YX of the observa- 
tions has the binomial density f(Z) = (2)(4)", for x = 0, 1, 2, --+ , n. By group- 
ing the n + 1 points (0, 1, 2, --- , n) into three different zones, the proposed 
test is applicable. 

EXAMPLE 2. (Comparison of two populations.) Suppose X, < X, < --: < X, 
and Y; < Y,. < --- < Y,, are the ordered results of two random samples from 
populations having continuous cumulative distribution functions F(x) and G(x) 
respectively. Let s; , 8%, °°: , 8, be the ranks of the observations of X. Let W = 
8 + 8 ++: + 8,. Denote by A(x) the pdf of the random variable W. Let the hy- 
pothesis Hy:F (x) = G(x) be tested against the alternative hypothesis H,:F (x) > 
G(x). Then, since the density ho(x) of W under Hy is known, one may apply the 
test procedure as follows. Choose two positive integers a and r. Decide on two 
numbers w’ and w” such that 


Pr(W < w’| Ho) = Ao, Priv’ s W s w"|Ho) = Ib, 
(7.3) 


Pr(W > w” | Ho) = Ro, 


and such that the pair (¢(fo), u(fo)) satisfies certain conditions. Continue to 
draw samples of sizes (n, m). At each stage, count the number of times that 
W <w’,w’ = W Ss w” and W > w”. Denote these numbers by ¢ , c and ¢; . 
Then, the proposed test is applicable. 
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We note that the procedures used in Examples 1 and 2 are not necessarily 
optimal. They are given here as possible applications of the proposed procedure 
in general. 


8. Efficiency. In this section, we shall investigate the power efficiency of the 
R. O. test as compared with Wald’s sequential probability ratio test. 

Let N(x; 6, 0°) be a cumulative normal distribution with an unknown mean 
6 and known variance o°. Let the hypothesis Hy:0 = 6) be tested against an 
alternative hypothesis H,;:¢6 = 6,. We shall calculate the numerical efficiencies 
of the R. O. test for the five cases: 6; = 0% + Ao, AX = 1.0, 1.5, 2.0, 2.5, 3.0. 
For each \, we shall denote by ¥(@) and n(@) the power and the ASN functions 
of Wald’s sequential probability ratio test, and by ¢,(@) and yu,(@) the power 
and ASN functions of the R. O. test fori = 1, 2, where by i = 1, it is meant 
a =r = 1 and similarly by i = 2 is meant a = r = 2. Furthermore, let (.05, .95) 
be the preassigned strength of all the tests, that is, for each A(A = 1.0, 1.5, 2.0, 
2.5, 3.0), we have ¥(0)) = (00) = .05 and ¥(@ + Ao) = ¢vi(0) + Ao) = .95 
(i = 1, 2). Then, it is obvious that for any real 6, the functions ¥(@), (6), ¢:(@) 
and u,(@) (i = 1, 2) depend not only on — = (@ — 6)/c, but also on X (i.e., on 
H,). In Tables I and II are given the numerical values of these functions for 
h = 1.0, 1.5, 2.0, 2.5, 3.0 and selected values of £. Since the power curves for 
both the sequential probability ratio and the R. O. tests are close to each other 


TABLE I 


Sequential Probability 
Ratio Test 


¥@ 1(@ ‘ 5 : &: 


0500 5.2997 0500 57.4197 .0500 4343 5079 


. 5000 
9500 


. 6695 5000 4618 d . 5000 9957 5781 
20097 9500 57.4197 ‘ .9500 4343 5079 


or 


0500 
2726 
5000 
7274 
9500 


9927 


3554 0500 2996 ‘ 0500 3.4738 6780 
5711 2733 1090 ; . 2840 4255 | .8069 
8531 5000 7662 5000 6081 8362 
5711 7267 | 1090 ‘ .7160 .4255 8069 
3554 9500 4.2996 . . 9500 3.4738 6780 
5473 9929 2.5203 | 9951 | 2.6966 5738 


— ro WWW t& 


0500 3249 | 0500 7689 |. 0500 
1866 | 1.8455 1890 2.3713 | . 2018 
5000 2.1674 5000 2.7442 : 5000 
8134 8455 | .8110 2.3713 | . 7982 
9500 | 1.3249 | .9500 7689 9500 
Qg881 | YS81 OS90 3671 ; .9922 
9972 7320 | .9979 1566 9992 


.4051 5509 
6881 . 6865 
. 8308 . 7656 
6881 . 6865 
4051 5509 
.1816 | .4392 
0649 3545 


we bw Ww tS WW 
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TABLE II 


Sequential Probability 
Ratio Test 


¥ (6 4:6 


0500 0500 2241 
1460 1512 4095 
3569 3617 5770 
5000 5000 6040 
6431 6383 5770 
8540 S488 4095 
9500 9500 2241 
9840 9R62 1007 


0500 ; .0500 0452 
1232 : .1324 O874 
2726 ; 2860 1318 
5000 ; 5000 1519 
7274 / 7140 1318 
.8768 8676 O874 
9500 5RRE 9500 0452 
9807 9846 0185 
.9927 386 .9961 0060 


in all the cases considered, then 
(8.1) & = n(0)/u(8), & = (0)/po(8), 


as given in the tables can be regarded as the approximate power efficiencies. 

From Tables I and II, we observe the following: 

(a) In order to obtain high efficiencies, it seems that, when both types of error 
are fixed and the difference (6, — 6))/o is small, one should make a and r large. 

(b) Some of the figures in the tables are misleading. It is clearly true that no 
matter which procedure is used, one has to take at least one observation before 
a decision can be made. Hence, the ASN in either case must be at least one. 
However, some of the figures for Wald’s case are less than one, which can not 
be regarded as practical. Therefore, in the case 6; = 0) + 3c, the efficiencies will 
be at least .87 uniformly if we assume that ASN is at least one. 

(c) If one is interested in improving the efficiency, say, for testing the hy- 
pothesis Ho:0 = 6 against the alternative hypothesis H\:6 = 0) + 40, then one 
may take the observations in groups of size 25 and apply the R. O. test on the 
means ~ (using a = r = 1). In other words, one is now testing the same null 
hypothesis Hy against an equivalent alternative hypothesis H{:0 = 0) + 2.50,. 
Consequently, the efficiencies are raised to at least 69 per cent for all alternatives 
6 between 6) and 6 + 4e. 

The author wishes to thank the referee for the helpful comments. 
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ON A CONTAGIOUS DISTRIBUTION 


By R. 8. G. RuTHEerrorp 
University of Sydney, Australia 


1. Summary. The purpose of this paper is to discuss the probability distribu- 
tion that arises when the probability of success at any trial depends linearly upon 
the number of previous successes. Such a scheme has obvious uses in both bio- 
logical and economic fields. 

It will be shown that by assuming a simple linear relationship between the 
number of previous successes and the probability of success in the next trial, we 
van derive a distribution that is reasonably easy to handle, provides as good a 
fit as more usual distributions, and has parameters which are capable of easy 
physical interpretation. Moreover, for appropriate values of the parameters the 
negative binomial and the Gram-Charlier systems can be shown to be close 
approximations. 


2. Introduction. Considerable attention has recently been directed to models 
where previous experience determines the probabilities in the forthcoming trial. 
This study is particularly indebted to the work of Woodbury [1]. Much of the 
recent work has developed the probability scheme originally postulated by Polya 
[2]. Here it is intended to extend that suggested by Woodbury, and it may be 
well to contrast the two schemes. 

In the Polya scheme, we have an urn containing b black and w white balls. 
After each random drawing, the drawn ball is returned together with c balls of 
the same colour. Thus the chance of drawing a ball of given colour depends upon 
both the number of previous successes and of previous failures. 

The Woodbury scheme involves the return of the drawn ball only, if the draw 
be a failure, and in the event of the draw being a success, the reconstitution of 
the urn, for example, by the replacement of “failure” balls by ‘‘success’’ balls. 
In this scheme the order of success is important; in the Polya scheme it is not. 

Formally the Woodbury scheme involves that if P(n, x) be the probability of 


exactly x successes in 7 trials, and p, be the probability of success after z previous 
successes, then 


(1) P(n + 1,241) = p.P(n, z) + qeuiP(n, x + 1). 


Woodbury has solved this problem in the general case. 


In this article we postulate further that p, is a simple linear function of z, viz: 
(2) Pc = pt cx, 03 2S Xn. 
Since we must have 0 < p < 1, we have the limiting conditions, 

(3) c> 0, ns c < 0, n<p/\c 
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a an} rr — 
rhis involves that c must always be of order n~ or smaller. These conditions do 
not prove very restrictive. 


3. The distribution and its properties. Following Woodbury, the solution of 
(1) and (2) may be shown to be 


P(n, x) = : |(? 
ril \e 


\(? F 1)(? + 2) fog (? x 7 - 1) | eer 
Cc c c r=0 
x ‘ 
(*) \@ — cf) . 


The summation term is clearly the coefficient of 6"/n! in e“(1 — e)’. 


(4) 


It is desirable to consider the effect of the restrictions of the conditions (2) 
and (3) upon the value of P(n, x). It will be shown now (a) that P(n, x) is zero 
for x > n, and (b) that P(n, x) is always positive for0 S z Sn. 

(a) Since the term (1 — e ”)’ if written as (cd — c’6’/2! + ---)’ contains only 
terms of order # and higher powers of 6, the summation term is zero for all 
32> &. 

(b) The condition that P(n, x) as given by relation (4) is always positive 
within the range « S n requires that with ¢ > 0 (i.e., with the product term al- 
Ways positive) the summation term should always be positive, while with ¢ < 0 
(i.e., with the product term alternately positive and negative) the summation 
should also alternate in sign, being positive when z is even and negative when 
x odd. Regarding the summation as the leading rth difference of the series q”, 
(q — cc)", ++: ,(q — ne)", shows immediately that (3) is a necessary and sufficient 
guarantee for the summation term to have the correct sign. 

The generating function of P(n, zx) is the coefficient of 6"/n! in 
eI (1 — &)e\-?* which may be written 


(5) e[(1 — the® + 7”. 


Though this expression has an infinite number of terms, the terms containing 6” 
will oecur only in the n + 1 terms containing powers of ¢ from 1 to ¢". Thus the 
generating function is a finite one. That the sum of all the P(n, x) forz =0--+ n 
is equal to 1 may be confirmed by putting ¢ = 1 in (5) and considering the co- 
efficient of 6"/n! in e*’. This gives us, for the factorial moment generating func- 


tion, the coefficient of 6"/n! in 
(6) ell — ale® — 1))7"". 
Denoting the rth factorial moment by f, , we have then 
fy = (p/e)l( + 0)" — 1, 
(7) fe = (p/c)(p/e + 1)[(11 + 2c)" — 2(1 + ec)” + 1], 
fs = (p/e)(p/e + 1)(p/e + 2)[11 + 3c)" — 311 + 2c)" + 3(1 +c)" — 1). 
4. Empiric fitting. For empiric fitting these three moments (7) should be 
enough to determine the three parameters n, p, and c. Since the present writer 
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has been unable to derive a method of fitting on maximum likelihood principles, 
a somewhat cumbersome method of solution is offered. We may write (7) in the 
form 


fi 


=a = afi, 


fe ‘ (1+c)" —1 
fi (1 4+ 2c)" — 21 + c)* + 1 


se fs (1 + 2c)" — 21+ c)"+ 1 
c ~ fo (1 + 3c)" — 3(1 + 2c)" + 3(1 +c)" — 1 


To obtain an estimate of n we can approximate these further as 


np = [1 — (n — 1)e/2)fi, 
(n — 1)(p +c) = [1 — (nm — 8)e/2)f2/f,, 
(n — 2)(p + 2c) = [1 — (nm — 5)e/2)f3/f.. 


From these relations p and c may readily be eliminated, giving a cubic for n. 
In this cubic, n = 1 is always a root, and the relation may be reduced to a quad- 
ratic, of which the positive root is the only relevant one. Since n must be integral, 
the nearest integer may be taken as a trial value. Having obtained n, it is easy 
to evaluate p and c. The terms of the distribution are very sensitive to small 
changes in p and c, which should be evaluated carefully. 

It is intended to develop tables of the values of the expressions a; , a , and ay 
which will make the fitting less arduous, and more reliable for ranges in which 
the above approximations are not valid. 


5. Comparisons. The results of fitting this distribution to two classical sets of 
data are given in Tables I and IIL. In both cases, the fit of the present distribution 
is at least as good as in the standard fittings. The improvement is not remarkable, 
but the parameters of the distribution have a clear physical meaning which can 
never be claimed for the parameters of the negative binomial or the Neyman 
contagious set [5]. That is the major claim made for this work. 

It is intended now to investigate why other distributions appear to be close 
approximations in certain circumstances. It is important, however, to make 
plain the purpose of the following sections. There is no intention to discuss the 


TABLE I 
Accidents to women working on H.E. shells, data of Greenwood and Yule [3| 


n=6 p= 059886 c = 0.103036 


Number of accidents 3 


Observed frequency ; 21 
Negative binomial : f 14 
Neyman contagious distribution 

Present distribution 
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TABLE II 
Yeast cells in 400 squares of a haemcytometer, data of “‘Student’’ {4| 
n= 13 p = .046747 c = 0.019088 


Number of yeast cells 5 Tot 


Observed frequency 213 128 ‘ 18 : 4100 
Negative binomial 214 123 4! 13 400 
Present distribution 215 | 122 f 14 : 400 
Gram-Charlier Type B 216 119 j 15 : 100 


minutae of the conditions under which the approximations will be valid. Such 
conditions may be found by anyone sufficiently interested. 

The purpose in this context is merely to explain why certain distributions have 
provided reasonably good fits to empiric data which may have, in fact, been gen- 
erated by a system of the type of (1). A second, and perhaps subsidiary, point is 
that the fitting of the distribution is difficult, particularly as no maximum likeli- 
hood method seems available. This may be overcome in certain ranges by fitting 
these other distributions, where the parameters are easier to determine, if these 
parameters can be interpreted in terms of those of the present distribution. 

To illustrate and confirm the following sections, a number of actual distribu- 
tions have been evaluated, together with the approximations under discussion. 
These are given in Section 8. 


6. Binomial approximations. In the negative binomial generated by 
(8) (i + P) — Py", 
we have, by standard methods 
(9) fi = kP, fo = klk + 1)P’, fs = k(k + 1)\(k + 2)P". 
By the method of moments we can then determine the parameters as 
(10) k = fi/(fe — f = (fe — fi)/fir. 


Comparing (9) with (7) shows a considerable similarity of form, if we assume 
that c and hence p/c are positive. If c and n be small enough for us to equate 
(1 + 2c)" and (1 + c)™”, we will have at once 


(11) k p/e, P =(1+.c)" —1. 


The necessary conditions for this to be valid are somewhat complicated but in- 


volve that terms in nc’ may be neglected and/or that p/e < n — 1. In practice 


the first of these is rarely likely to be obtained. However, it can still be demon- 
strated, by some rather cumbersome analysis not shown here, that solong as p/ce < 
n — 1 a negative binomial can be fitted, though the parameters no longer bear 
easy interpretation in terms of those of the original distribution. 

A positive binomial generated by, say, (Q’ + P’t)“’ might provide a good fit 
if c is either negative or positive with p/e 2 n -- 1, that is with f; > f.. The 
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first possibility is restricted by the fact that it also would appear to require 
strictly p/ |c¢ | <n — 1, contrary to (3). The approximation, however, is reason- 
ably good for values of p/ | c | of about the same order as n. The second case also 
seems to be relevant only when p/c exceeds n — 1 by only a small amount. Cases 
where p/c greatly exceeds n — 1 may be handled more satisfactorily otherwise, 
as in the following section. 

Both cases, however, suffer from the difficulty that the value of k’ is not, in 
general, integral. This will not be an insuperable difficulty if P’ be small and k’ 
large, for then we may be in the territory where a Poisson distribution may ap- 
proximate to the positive binomial and hence to the original distribution. How- 
ever, again it seems that large values of p/c or p/ | c| (whether exceeding n or 
not) may be dealt with best by the Gram-Charlier approximations. 


7. Gram-Charlier approximations. S/age |, binomial type. Let us now consider 
cases where the ratio p/c is large. Returning to (4), we have already shown that 
the summation term is the coefficient of 6"/n! in e”(1 — e )’. If ¢ be sufficiently 
small, then we have 


(13) (—@)*/c* = (1 — cOx/2 + c'6'x(Bx + 1)/24 — 
Hence the summation term is 


d n! i cx(n — 2) ca(Be + 1)(n — z)(n — x + 1) 
(14) cq l1— 
1 


n— 2! 2q \ 24q’ 


Alternatively, with c small, the summation term is 


r/ n I’ n 
O—x/2 


n! i daa cx(n— 2). ca'(n—2z)(n—2+4+1) 
eq 1— : + -— ; 
n— x! 2q 8q° 


(14a) 


These approximations are, of course, true for all values of p/c. 
If we leave the product part of P(n, x) in its original form, obviously we can 
obtain as an approximation to the whole expression 


(15) P(n,x) = (") p(p + c)(p + 2c) «++ [p + (a — Wellq — ex/2)"”. 
LZ 
In this form there is more hope of a maximum likelihood fit. 


If, however, p/c be sufficiently large, we may write the product term of (4) as 


pip +c) +--+ [p+ (a — Ie) 


zc 


-_ Pp (1+£5r4 


zle* 


Cc 
Pp r=() p* ras) gen) 


> 


x2 — 12 — 2)(@e = ) 


‘ 
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and hence derive 


/ 


P(n, 2x) = (")a *p' <1 + a2lz — (np + q)] Pe 
x \ 2p 


q 
(17) + x [p'(34 + 1)(n — z)(n — x + 1) — Opgqa(x — 1)(n — 2) 


2 ov se c 
+ q(x — 1)(x — 2)(3z — 1)| ——, - 
24p°¢° 
The term in c’ will be at most of order c’n‘/8, and subsequent terms of smaller 
order. If, therefore, such terms may be neglected, we may write 


(17a) P(n, 2) = (") a" “py |! + xix — (np + 8)] |. 
x 2pq 


It is instructive to compare this distribution with the binomial distribution 
having constant probability p. With ¢ positive, that is, probability increasing, 
P(n, x) exceeds the corresponding binomial term for x > np + q, and P(n, z) is 
less than the corresponding binomial term for 0 < 2 < np + q; with ¢ negative, 
the conditions are reversed. (It can also be established that the conditions on c 
that make the approximations valid also ensure that P(n, x) is always positive.) 
If, moreover, n and p are of the order to make the binomial symmetrical, the 
skewness of the distribution is an immediate guide to the sign and magnitude of c. 

Stage II-A, Limiting form for large n and large p. As indicated in Section 5, the 
conditions necessary to ensure that the approximations will be valid for all z 
have not been elaborated. It is immediately obvious that much less stringent 
conditions will apply for early terms of the distribution than those required for 
the whole distribution. For central values of p, the latter terms of the binomial 
part of the expression for the distribution will in any case be small, and the ab- 
solute if not the proportionate error small. 

These considerations become important when we examine the limiting form of 
(17) when n becomes large. By the change of variable X = (x — np)/Wnpq 
used to transform the binomial into the normal distribution, we find as the con- 
tinuous distribution parallel to the normal 


(18) dP = o(X)[l — }ne + {n(n — 1)pe 2V npq} X + 4ncX*| dX, 


where @(X) = (29) '” exp {—4X"}. If c/p be not quite small enough to make 
the Stage I approximations valid, there will be discrepancies at the right tail. 
The fit will be poor at both tails in any case, in the same way as the normal is a 
poor approximation to the binomial at the tails. But, by and large we may expect 
to get a good fit with a curve of the form 


(19) dP = $(X)[(1 — az) + aX + a,X*] dX. 


By transferring the origin to the mean x = a, and standardising the distribution, 
we may obtain readily 


(20) dP = o(X)[1 + wsll.s)/3! + (us — 3) /4! + ---] dX, 
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which is the standard Gram-Charlier Type A distribution [6]. As shown earlier, 
the skewness of the system indicates the type of scheme operating, that is, the 
sign and relative magnitude of c. Consideration of the order of the terms involved 
indicates that we need consider only the terms listed. 

Stage II-B, Limiting form for large n and small p. The limiting process used in 
Stage II-A is of course valid only if p is not small. It is interesting to investigate 
whether with p small (but p/c still large), we obtain the Gram-Charlier Type B 
distribution. For all p we may write (17) in the form 


(21) P(n, x) os (”) q” *p" = af (” Z a *) Pe ad ( 


where \ = 4n(n — 1)(p/q)c. If now p be small, and of order n™', such that 
np =m, (n — 1)p = m, (n — 2)\p = ™, 
when n becomes large we obtain 
(22) P(n, x) = & ™'mi/z! — re ™*me '/(x — 1)! + Ae ms, ?/ 
It is now reasonable to equate the m’s and write 
P(z) = & ™"(m7/x2! — Am™"/(x2 — 1)! + Am*?/ (x 
which again may be written 


(23) P(x) = (e°"m*/x!)[1 — \a2/m + Aa(x — 1)/m’). 


This is the required Gram-Charlier Type B [6]. It is most easily fitted by means 
of the relations 


(24) ui = m+, uw = m+ 3c — 2X’, 


which can be solved readily for m and \. As an illustration, “‘Student’s’”’ data 
have been fitted by this distribution also (Table II). We have m = 0.61567 and 


\ = 0.06683. The fit, though reasonably good, is poorer than those previously 
considered; this may reasonably be attributed to the relatively low values of 
n = 13 and of p/c = 2.45. 


8. Calculations. The validity of the approximations suggested above is demon- 
strated by 16 examples in Table II]. We have here a number of distributions 
calculated with a selection of values for n, p, and c, and the values given by the 
relevant approximating distributions. All values are quoted to four decimals, 
though they have been calculated to five or more. The approximating distribu 
tions used are identified by roman numerals. 

Type I is the negative binomial fitted from the moments of the distribution. 
In each case the values of the parameters used are given for comparison with 
those given by (11), which also are given. 

Type II is the approximation of the form suggested in (17a). In this case there 
is no attempt to find n, p, and c from the data to give the closest fitting curve of 
this type; the values used are those of the original distribution. Table IIT shows 
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the constant np + q and c/2pq, used in (17a) with the corresponding values of 
n and p. 

Type III presents the areas cut off in the range xz + 4 of the continuous dis- 
tribution of the form of (19). Table III shows the transformation from z to X, 
and the values of a; and a, as calculated from the initial values of n, p, and c, 
with no effort at improvement. 

Type IV is the positive binomial, used where the moments make it impossible 
to fit a negative binomial. Since the exponent is not an integer, it is fitted as a 
Poisson, of which the parameter m is given. 

Type V is a Gram-Charlier Type B of the form of (23), fitted when appropriate. 
The parameters m and \, obtained from (24), are given. 

The 16 examples fall into five groups, which examine different aspects of the 
approximations. 

Group A has n = 100 and ¢c = .0050 throughout. The value of p varies from 
.0005 to .0500 and the ratio p/c from 0.1 to 10.0. With n as high as 100, only 
the early terms can be evaluated readily. This limits the possible range of p, 
and of the ratio p/c. The negative binomial provides a very good fit for low 
values of p/c. It is still good for the earlier terms of the fourth example, and on 
the whole better than the Type II approximation. 

Group B has a smaller n of 20 and a larger c of 0.01, but values of p such that 
the same four values of p/c are obtained. Again, for low values of p/c the nega- 
tive binomial gives a very good fit. For larger values of this ratio, however, 
Types II and V are also relatively good fits. 

Group C comprises three examples where p/c is larger than n, which is 10 in 
all three. The negative binomial can no longer be fitted. The Type II approxima- 
tions are reasonably good in all three cases. The Type IV and V approximations 
in the first example suffer from the small values of n, while the poor Type III 
approximation in the third example reflects the poorness of the normal as an 
approximation to the binomial with n = 10 and p as large as 0.70. 

Group D presents three examples designed to examine the validity of the 
Gram-Charlier Type A approximation, Type III. Since n is small, a limitation 
produced by the practical difficulties of computing the “exact’’ series, central 
values of p have been taken because the binomial-normal approximation is 
closest at these values. In the first two examples the Type II fit it sufficiently 
good to make the Type III fit reasonable. In the third example, with a larger 
c = 0.04, the Type II fit is relatively poor and the Type III fit is worse, reflect- 
ing the fact that terms in c’ may no longer be neglected in (17). 

Group E contains two examples in which c is negative, —0.01, with n still 10. 
Types II and IV are used in the first example, with p = 0.10, and Types II and 
III in the second, with p = 0.50. 


9. Significance of the results. In all fields of scientific investigation the end 
goal is always explanation rather than mere description. The negative binomial 
and the Gram-Charlier set have been found to be good descriptive fits for a large 
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number of empiric distributions. Probability systems to ‘‘explain’” them have 
also been available. 

The assumption that we are sampling from a population where the probability 
varies between individual members and is distributed in the form of a gamma 
variate will produce as the expected distribution the negative binomial. Equally, 
the Gram-Charlier system may be derived as the resultant of a small number of 
linearly additive independent causes of about the same order of importance. 

In both cases, distributions arise where these explanations are unconvincing. 
It has been shown above that a much simpler hypothesis will produce distribu- 
tions that are at least as good a fit, and which in some cases, though perhaps not 
in all, provides a more convincing ‘‘explanation.”’ As with the normal distribu- 
tion, we can choose which of two alternative “explanations” is most suitable in 
any particular case. 

The fact that the same probability scheme “explains” both types of distribu- 
tion considerably systematises the field. Moreover, it appears that there are 
large areas of possible values for the parameters n, p, and c, where the approxi- 
mations will not be valid. It is hoped that many empiric distributions which 
previously have appeared to obey no simple law now may become more tractable. 

It is possible, with reasonable ease, to establish that both the negative binomial 
and the Gram-Charlier Type B distributions may be considered as special cases 
of the Neyman contagious distribution, and hence that our present distribution 
will often be closely represented by it. It is more difficult to establish a direct 
connection, but other writers may succeed. 

Though Neyman claims for his series that “All the constants introduced have 
meanings which are easy to interpret,” this does not appear to have been general 
experience. The distribution of the present study may be of equally general ap- 
plication and provide opportunities for much simpler interpretation. 
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IDENTIFICATION AND ESTIMATION OF LINEAR 
MANIFOLDS IN n-DIMENSIONS'! 


By T. A. JEEVES 
University of California, Berkeley 


1. Summary. This paper investigates the problem of identifiability and estima- 
bility of linear structures in n dimensions. The concept of identifiability is 
examined to elucidate the senses in which it may be interpreted in the present 
problem. Particular attention is given to the question of treating linear subspaces 
rather than specific coordinate systems. Necessary and sufficient conditions for 
identifiability are obtained under the assumption that the “errors” follow a 
multinormal distribution. 


2. Introduction. In many fields of statistical application it is not, possible to 
observe directly the variables of interest but only to observe related random 
variables. Let X = (X,,NXe,---,X,) bea random (row) vector which is ‘“‘un- 
observable” and Y = (Y,, Ye,---, Y,) be a random (row) vector which is 
“observable.’’ Assume that Y = XB + U, where B is a parameter having n x n 
matrices of sure numbers for values and U = (U,, Us,---, U,) is a random 
(row) vector which is stochastically independent of X. 

In this paper particular attention will be given to the case in which U has a 
multinormal distribution, and it is desired to determine the row space S of the 
value of B. Two problems are considered: (a) identifiability, whether S is deter- 
mined if the distribution of Y is known [1], [2], and (b) estimability, whether S 
can be estimated consistently [1] from an infinite sequence of observations on Y. 

Similar problems were considered in 1901 by Pearson [3]. As early as 1916, 
Thomson [4] showed that estimates based on moments no higher than the second 
would not be consistent. In 1936, Neyman [5] indicated a set of conditions in 


which, because of nonidentifiability, no consistent estimates existed. A summary 
of the state of the problem in 1940 was given by Wald [6], who brought an en- 
tirely new approach. An answer for the case of two dimensions was supplied by 
Reiersg@l [7] in 1948. 


3. Identifiability. The problem of identification in n dimensions introduces 
features not present in the two dimensional problem. In particular, just what 
is to be identified and hence estimated must be clarified. In n dimensions a 
greater variety of possible interpretations is available. To elucidate the sense 
in which the problem is treated here, and to bring out the relationships to other 
work, it seems necessary and profitable to begin with some general remarks on 
identification, culminating in the definition of identifiability (Definition 3) 
utilized in the remainder of the paper. 


Received 12/15/52, revised 2/12/54. 
1 This pay’er was prepared with the partial support of the Office of Naval Research. 
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Following Neyman [1], the concept of identifiability is defined in the following 
way. Let L be a relation L(#, F) between the elements 3 of a space 6 and the 
elements F of a set 2 of distribution functions. Let w() = {F | L(@, F)}. Let 
6 be a parameter with range 0, C 0. 

DEFINITION 1. 6 is identifiable (L) if the sets w(#) are disjoint for every 8 € 9, . 

This definition generalizes that of Neyman, in that 6 and 6, are not neces- 
sarily identical. The definition emphasizes the relation L between elements of 0 
and elements of 2. Essentially the definition states that identifiability obtains 
if no two distinct elements of ©, are related to the same distribution function. 
However, for the succeeding discussion, it is important to notice that this defini- 
tion implies the existence of a relation among the elements of the larger space 9, 
and that this relation characterizes the nature of the identification. The following 
theorem, which follows easily from the definition, brings out this point. The 
following notation is introduced: for any F eQ, 


y(F) = {8 €0| Fe a(d)}, ré) = U ¥(F), & = VU ald). 


Few(d) 60, 


THEOREM 1. 6 is identifiable (L) if and only if there is a relation R between 
the elements of © such that for every 3 ¢ 8, and every 3* e Td), 

(i) R(@&, 3*) holds, and 

(ii) R(W*, 3) holds only if 0* = #. 

The relation R is uniquely defined by the relation L for any J «9, 
and 3* e T(%); conversely, specifying R implies restrictions on L. 

From this it is seen that if 6 is identifiable (L) then there is a one-to-one corre- 
spondence for 3 ¢ 8, between 3 and w(#), and also between 3 and I'(#). Further, 
every F ¢Q, determines a unique value of 8 and there exists a function 7 with 
the domain 2, and range 9, such that if Fe Q,, then F e w(/(F)). If 6 = 9,, 
then the relation R is equality. In the following, particular attention will be 
given to the case in which the # are linear spaces and R is the relation of inclusion. 

To apply the definition to the problem considered, it is necessary to exhibit 
the relation L. Let M be the set of n x n matrices and 6 a family of subsets of M. 
Let 2 be the set of n-dimensional distribution functions and 2, U, and ‘Yy be 
nonempty subsets of 2 associated with the random variables X, U, and Y, re- 
spectively. Let Fx, Fy, and Fy be the distribution functions of the random 
variables X, U, and Y, respectively. 

DeFINITION 2. For any sets X, U, and 0, the relation L(, Fy) holds if Bed 
and if Y = XB + U for some X and U such that Fy e © and Fy e WU. 

It has been shown [8] that conditions must be imposed on both & and %U if 6 
is to be identifiable (L). 

Further analysis of the problem requires consideration of the effect on the 
matrix B of a nonsingular transformation P. Identification problems may be 
proposed in which the space 6, is so specialized that PB no longer belongs to an 
element of 6, , or may not for certain P. Sich problems will not be considered 
in this paper. It will be assumed that 0, has the following property: for any non- 
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singular n x n matrix P, if Bede O,, then PB ¢d* for some J* ¢ 8, . Con- 
sidering the definition of L above it follows that: 

THEOREM ?. Jn any sels 8, 0, , X, and U such that 0, € 8 and @, has the above 
property, if 6 is identifiable (L) then for each nonsingular matrix P either 

(a) Fx € X implies F xp not a member of X for every X, or 

(b) Bed implies P"“Bewd for every de O,. 

The content of this theorem indicates two broad categories of problems, those 
in which condition (a) is satisfied by all nonsingular matrices and those in which 
condition (b) is satisfied by all nonsingular matrices. Mixed problems in which 
some matrices P satisfy (a) and some (b) might also be considered. The assump- 
tion that condition (a) is satisfied for every P leads to the consideration de- 
veloped by Koopmans [2], [9]. 

This paper explores the implications of assuming that condition (b) is satisfied 
forall P, that is, that the matrices belonging to¥# are all row equivalent or have the 
same row space. It will thus be convenient to think of 3 as a row space. With this 
interpretation the problem being considered below is that of identifying the row 
space of the matrix B. The row space is a natural parameter in the problem of 
general linear structures. As such problems frequently arise, the components of 
X are presumed to lie in a linear subspace of Euclidean n-space; the determination 
of this linear subspace is desired. The specification of a particular set of co- 
ordinates on this subspace (that is, the determination of B) is frequently not 
required. 

Throughout the remainder of this paper it will be assumed that the elements of 
0, are the sets of row-equivalent matrices corresponding to the various row 
spaces of dimension s and that 9 =U/9, . It will also be assumed that U is the 
set of multinormal distributions. Since & is not specified, the relation L is not 
completely determined. Instead of specifying the set 9, it will suffice to select 
the relation R (see Theorem 1) and investigate what conditions on X are neces- 
sary and sufficient for identifiability. Two natural relations among linear spaces 
are the relation of equality and the relation of inclusion. The treatment here will 
be confined to the relation of inclusion. Similar results for the relation of equality 
have been obtained [8]. 

In view of the preceding considerations, the definition of identifiability may be 
particularized for the relation of inclusion as follows: 

DEFINITION 3. 6 is identifiable (L*) if 8(8) C 8(8*) for every 3 ¢ 6, and 3* e TW). 

Here $(#) denotes the row space of #, that is, the vector space spanned by the 
row vectors of any element of 3, while L* denotes a relation L which gives rise 
to the relation FR of inclusion (see Theorem 1). Here R(#, 3*) means S(3) C S(¥*). 


4. Necessary and sufficient conditions. As in the case of two dimensions [7], 
identifiability is related to a lack of normality in the random variable X. This 
concept of the amount of nonnormality of a random variable is defined below. 

DEFINITION 4. The dimension of a random variable U is the smallest dimension 
of all linear subspaces which contain U with probability one. 
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Derinition 5. Let nn(Y) be the least value of d such that Y = U + V with 
U and V independent, V having a multinormal distribution and U having dimen- 
sion d. This value nn(Y) will be called the nonnormality of Y. 

DeFinition 6. The nonnormality of ‘Y is s (i.e., nn(Y) = s) if nn(Y) = s for 
every Y such that Fy ey. 

In terms of the definition of nonnormality, the main result on identifiability 
of linear manifolds in n-dimensions can be stated as follows. 

THEOREM 3. 6 is identifiable (L*) if and only if nn(Y) = s. 

The proof of the theorem depends on the following lemmas, the proofs of 
which are straightforward [8] and will not be given. 

Lemma 1. Jf M is an n x n matrix with rank s and if (n — 8) columns of M are 
identically zero, then every row either belongs to some s x 8 submatrix with rank s, 
or else is identically zero. 

LemMa 2. Jf A is a symmetric matrix, E is a diagonal matrix with ones and 
zeros on the main diagonal, and EAE is positive semidefinite, then there exist matrices 
C, G, and H such that 


CAC’ = DGD + EHE, 


where D + E = I (the identity) and H is a diagonal matrix with ones and zeros 
on the main diagonal, C is nonsingular, and CD = D. 

The following choice of notation has been made. The symbol f(t) will be used 
to denote some polynomial! of the second degree in f, but not necessarily the same 
polynomial at each usage. Distinct polynomials will not generally be distin- 
guished. The characteristic function of a random variable X will be denoted by 


gx(t) = / e dF x(x ), where ¢ and x are row vectors. Further,yx(t) = —log¢x(t) 


Lemna 3. If py(t) = Px(tB’) + Welt), and if U has a multinormal distribution, 
then for any matrix C which is idempotent and row-equivalent to B, 


Wy(t) = py(tC’) + fit). 


In particular C may be the canonical form of B. 

Dertnition 7. The canonical form of the matrix B is a matrix C which is row 
equivalent to B, with elements satisfying, for each i = 1, --- , n, 

(a)c; = 0 or cy = 1; 

(b) if c;, = 0, then c;; = O for all j and c;; = 0 for 7 2 7; 

(c) if ce, = 1, then c,; = 0 for 7 < iandc;; = 0 forj # i. 

Lemma 2 can be used to prove 

LemMA 4. If gy(t) = Wy(tB’) + f(b), then py(t) = py(tF’) + y(t — F)’), 
where 

(i) F is ide mpotent and row-equivale nt to B 

(ii) exp {—wy[t(1 — F)’|} is the characteristic function of a multinormal random 
variable. 

From the preceding lemma and the definition of nonnormality one 
easily obtains 
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Lemma 5. nn(Y) = 8 if and only if s is the minimum rank of all matrices A 
such that 


vr(t) = pr(ta’) + f(t). 


Lemma 5 hence furnishes an alternate definition of nonnormality. 
Proor or THeoreM 3. If B ed, then r(B) = s, Let Y be any random variable 
such that L*(8, Fy), and let ¢ = nn(Y). Then 


(1) Y = XB + U. 
(a) From the relation L*, it follows that 
(2) vr(t) = ¥x(tB’) + Pot). 
Hence by Lemma 3, py(t) = py(tC’) + f(t), where C is idempotent and row- 
equivalent to B. Therefore, by Lemma 5, ¢ S s. 
(b) Assume @ is identifiable (L*). By Lemma 5 there exists a matrix A of rank 


t such that py(t) = PyitA’) + f(t). Hence there exists a matrix F having the 
properties enumerated in Lemma 4. But from (2) above, it follows that 


Wy(tF’) = Wx(t(BF)’) + Wol(tF’). 


Therefore, letting B* = BF and U* = UF + YU — F), one obtains that X 
and U* are independent and U* has a multinormal distribution. Further, 
Y = XB* + U*%, 

so that L*(8*, Fy) holds, where 3* is the set having B* as an element. Now 
r(B*) s r(F) = t. But since @ is identifiable, $(8) C $(8*) so that r(B) s r(B*). 
Whence s S (¢, and part (a) then implies s = ¢, that is, the condition 
of the theorem is necessary. 

(c) Assume @ is not identifiable. In view of part (a) it is required to show that 


t < s. By hypothesis, there exist random variables X* and U* and a matrix 
B* ¢J3*, such that L*(8*, Fy), 


(3) Y = X*B* + U* 


and $(8) € s(#*). Equations (1) and (3) and Lemma 3 imply 


(4) Wr(t) = Py(tC’) + filt) = py(tC™) + f(t), 


where C and C* are idempotent and respectively row-equivalent to B and B* 
and have s and s* rows which are not identically zero. There exists a nonsingular 
transformation P which reduces C* to a diagonal matrix D* = C*P having only 


ones and zeros on its main diagonal. Let A = CP, then (4) yields 
(5) Wy(taA’) = py(tD*A’) + f(t). 


Equation (5) will be analyzed in three cases 
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Case I, r(D*A’) < s. Let G’ = P’"D*A’. Then r(G) < s, and 


Wy(tC’) = py(tG’) + fib). 


This, together with equation (4) and Lemma 5, implies ¢t S r(G@), so that t < s. 
Case II, r(D*A’) = s, and there exists a diagonal matrix D having only ones 
and zeros on the main diagonal such that 


r(D) = 8, r(DA’) = s, r(DD*) < s. 
Substitution of o = tD for t in (5) gives 
vy(oDA’) = py(eDD*A’) + f(a). 


The vector r = oDA’ has exactly s components which are not identically zero. 
Sinve r(DA’) = s, there exists a matrix a such that r(a) = s and o = ra. Hence 


vy(r) = by(raDD*A’) + f(r). 


Since tA’ has the same nonvanishing components as 7, 


vr(tC’) = pr(tH’) + fl 


where H’ = P’'A’aDD*A’. This, together with equation (4) and Lemma 5, 
implies ¢ S r(#Z), and since r(G) < r(DD*) < s, it followst < s. 


Case III, r(D*A’) = s, and for every diagonal matrix D having ones and 
zeros on the main diagonal, r(DD*) = s whenever r(D) = s and r(DA’) = s. 


Let a;, for j = 1, --- , m be the row vectors of A’ which are not identically zero. 
Then by Lemma | each row a;, is included in some s-rowed minor of A’ of rank s. 
That is, there exists a diagonal matrix D; such that r(D,;A’) = s and r(D;) = 8 


with elements d;,;; = 1 for j = 1, --+ , m. Since r(D;D*) = 8, by hypothesis, 
then di,i; = 1 forj = 1,--+-,m. Hence, it follows that AD* = A. Since D* is 
idempotent, then 8*(A) C 8*(D*). Here the notation $*(A) denotes the space 
spanned by the row vectors of A. It then follows that 8*(C) C 8*(C*), and hence 
S(8) C $(8*), contradicting the hypothesis that @ is not identifiable (L*). Case 
III is therefore impossible. 

This completes the proof of sufficiency for Theorem 3, as these three cases 
exhaust the possible situations arising from (5). The corollaries below are easy 
consequences of Theorem 3. 

Coro.uary 1. Denoting XB by S, 6 is not identifiable (L*) if and only if 
S = ZG + V, where Z and V are independent, V has a multinormal distribution, 
and r(@) < r(B). 

Coroutuary 2. Jf X = ZG + V and r(G) < r(B), then @ is not identifiable 
(L*). 

Coro.uuary 3. If 6 is identifiable (L*), then the nonnormality of X is not less 
than s. 

The expression of Theorem 3 in terms of the raudom variables X is more 
natural if the problem is reformulated in an equivalent way [8]. Let Y, denote 
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a vector with n components and B,, a matrix with s rows and n columns, s S n. 
Let X, denote a set of s-dimensional distribution functions. 

THeoremM 4. When the relation L** is characterized by the equation 
Y, = X,B,, + U, with r(By.) = 8, then 6 is identifiable (L**) if and only 
if nn(X,) = 8. 

Taking n = 2 and s = 1, one obtains the result of Reiersgl [7]. Exactly similar 
results are obtained if the relation # is taken to be equality rather than inclu- 
sion. Again, if 6, were chosen as the set where elements are all the sets of row- 
equivalent matrices, so that 6, = 0, then one would have: 

THEeoreM 5. 6 is identifiable (L*) if no linear combination of the components of 
X is normally distributed. 


5. Estimation of linear structures. In this section, an estimate is constructed 
which converges with probability one to the linear structure. An infinite sequence 
of vector random variables (Y;, Y2,--- ) is considered. No assumption what- 
ever is made concerning the existence of moments of Y;. Each Y; satisfies 
definitions 2 and 3; furthermore, Y; and Y ; are independent if 7 7. It is assumed 
that s is known. For every N, let Zvy = (Yi, --:, Yw). A function Ty(Zy) will 
be constructed such that P{7T'y(Zy) — 8(B) as N — «} = 1, where Ty isa 
linear vector space and the convergence 7'y(Zy) — $(B) is defined in 

Derinition 8. If Cy and C are linear vector spaces, then Cy — Cas N — ~, 
provided Ay — 0 as N — «, where Ay = max, min, |k — r| for all unit vectors 
k in Cy , and all unit vectors r in C. The quantity Ay will be called the distance 
between the sets Cy and C. 

Hence if Cy and C are linear vector spaces and Cy a random variable, then 
Cy converges almost surely to C if P{Ay — 0} = 1. A unit vector is here a vector 
of length one. 

Dertnirion 9. A matrix B is related to a random variable Y if B ¢ 3 for some 
0 such that J ¢ O, and 3 ¢ y(F) (cf. Definition 1). 

From part (c) of Theorem 3 one obtains: 

Lemma 6. If py(t) = Wy(tC’,) + ft) fori = 1, 2, where C; is idempotent 
with rank S; , then 

(i) either nn(Y) < 8, or else $(C1) C 8(C2), and 

(ii) either nn(Y) < 8» or else 8(C2) C S(C)). 

Lemma 7. /f 6 is identifiable (L*), if B is related to Y, and if G is idempotent, 
then py(t) = py(tG’) + f(t) if and only if 8(B) C 8(C). 

Proor. Suppose $(B) C 8(G@) and C is an idempotent matrix row-equivalent 
to B. Then y(t) = Pr(tC’) + f(t), and r(C) = nn(Y). Since CG = C, then 
Wy(G’) = py(tC’) + f(t) and Py(t) = py(tG’) + f. 

Conversely, suppose py(t) = py(tG’) + f(t). Then, since nn(Y) = r(C), 
Lemma 6 implies 8(C) C S(@). 

Lemmas 4 and 7 imply 

Lemna 8. /f @ is identifiable (L*) and B is related to Y, then, for any idempotent 
G such that 8(B) <& 8(G), there exists F idempotent and row-equivalent to G such 
that y(t) = Py(tP’) + byt — F)’), and YU — F) has a multinomial distribution. 
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Lemma 9. If F is idempotent with rank n — 1, then F = I — r’a/ra’ for unique 
row vectors r and a. 

From Lemmas 8 and 9, it follows that yy(t) = Wy(tF’) + 40°(ta’)’ if and only 
if 8(B) C 8(F), where F is chosen as in Lemma 9. This property is made the 
basis of a criterion to determine 8(B). Letting 


ta’)? 


L(t) = Lit; a,r, a) = gy(t) = gy(tF’)a ’ where a = exp|—4o’], 


it follows that L(t) =, 0 if and only if 8(B) C S(F) and a is suitably chosen. 


Define G(r) = mins.« | L(t)L(—t) dX(t), where X(f) is a strictly increasing bounded 


function and the integration is taken over the entire space. Then G(r) = 0 if 
and only if r is orthogonal to 8(B). Thus if Fy were known, an investigation 
of the zeros of G(r) would yield explicit knowledge of 8(B). 

Determining a random variable which converges almost surely to G will 
enable the desired estimate to be constructed. To this end the sample character- 
istic function is defined by 


N 
gn(l; Zy) - : Zz e' _—™, 
N kat 
Then Gy(r; Zw) is defined by replacing gy(t) by gy(t; Zw) in the definition of 
G(r). The space C is complementary to S = 8(B), that is the space spanned by 
the unit vectors r for which G(r) = 0. 

The estimate 7'y(Zy) is defined to be the linear space orthogonal to the linear 
vector space Cy spanned by the vectors hk; , kz, --- , kn... The vectors k; are 
defined by the following construction. 

(i) k, is any unit vector for which Gy(k; ; | Zw) = min, Gy(r; Zw). 

(ii) for 7 = 2,---,n — 8, k; is any unit vector such that Gy(k;; Zy) = 
min,.o; G(r; Zw), where O; is the linear space orthogonal to hk, , --- , kj... 

The proof that the estimate converges almost surely is based on the following 
lemma which is a corollary of a theorem of Rubin [10}. 

Lemma 10. For any finite cell T of Euclidean n-dimensional space, 

P{lim gy(t; Zw) = gy(t) uniformly for te T} = 1. 
N+ 

Taking F as in Lemma 9 and since a“ is bounded for a ¢ [0, 1] and wu real, 

then 


P{lim Ly(t) = L(t) uniformly for te T and r,a,a} = 1. 
Nw 
Here Ly(t) is defined by replacing gy(t) by gw(t; Zw) in the definition of L(t). 
From this, since r, a, a are on compact sets, it follows that 
Lemma 11. P{limye. G(r; Zy) = G(r) uniformly nr} = 1. 
Lemma 12. If Ay is the distance between Cy and C, then P{Ay-»0asN—- «| = 
1, provided 6 is identifiable. 
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Proor. For any r > 0, lét C, be the set of unit vectors k such that 
minyec |k — r| < 7, where r is a unit vector. For any 7 > 0, take 


r =m”, £ = min G(r) with 7 not a member of C, 
Then, there exists NV, such that for every r and all N > N,, | Gy(r; 
< ¢ with probability one and, hence, both 


min Gy(r; Zw) with r notamember of C, 2 & — e > £/2 


min Gy (r;Zy) with r not a member of C, < ¢ <£/2. 


Therefore, if k satisfies min, Gy(r; Zw) = Gy(k; Zw), it follows that ke C,, 
and hence hk, ¢ C, . 

It can be shown similarly that, ifn — s 2 2, then k,eC,, since in this case 
there must be a unit vector r such that r ek and r e C. Likewise it can be shown 
by induction that k; eC forj = 1,---,n—s. 

Let k be any unit vector in Cy. Then k = > jor djk; and kk’ = 1 implies 
> i di = 1. Since k; ¢ C, there are vectors r; ¢ C such that |k; — rj| < 7 for 
j3=1,°--,n—s. Then 


SrVn—8 <7. 
j=l 

Hence, for any 7 there exists N, such that Cy C C,, provided N > N,, and 
therefore Ay < 7 with probability one. 

The following lemma is straightforward. 

Lemma 13. Cy converges to C if and only if Sw converges to S, where Cy and 
C are the complements of Sy and S, respectively. 

Lemmas 12 and 13 then imply 

TuroreM 6. If 6 is identifiable (L*), then the estimate Ty(Zy) converges almost 
surely to 8(B). 
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ASYMPTOTIC BEHAVIOR OF SOME RANK TESTS FOR ANALYSIS 
OF VARIANCE! 


By Frep C. ANDREWS’ 
Stanford University 


1. Summary. The H test and the median test have been proposed for testing 
the hypothesis of the equality of c probability distributions. Assuming transla- 
tion-type alternatives, we find the limiting distributions of the H and median 
test statistics. These results are used to derive general formulas for the asymptotic 
relative efficiencies of these tests with respect to one another and to the classical 
F test. A short discussion of the computation of approximate power functions 
of these tests is also included. 


2. Introduction. A few tests of a non-parametric nature have been proposed 
for the problem of testing the equality of ¢ probability distributions, the so 
called c-sample problem. Tests for the two-sample problem have been proposed 
by Wilcoxon [22], Mann and Whitney [11], J. Westenberg, [21], and Mood and 
Brown [12], among others. Consistency and power properties of some of these 
tests have been discussed by van Dantzig [3], Lehmann [8], [9], and Mood [13], 
among others. 

The H test proposed by Wallis and Kruskal [20] is a direct generalization of 
the two-sided Wilcoxon test discussed in detail by Mann and Whitney [11]. 
The H test is similar to a classical F test, with ranks replacing the original 
observations. The Mood-Brown median test [12] makes use of the construction 
of a 2-by-c table and the resulting large sample theory thereof. Pitman [16] 
derives the general formula for the asymptotic relative efficiency of the Wil- 
coxon test with respect to the ordinary ¢ test, when quite general translation- 
type alternative hypotheses are assumed. Mood [13] assumes only normal alterna- 
tive hypotheses and computes the asymptotic relative efficiencies, with respect 
to the ¢ test, to be 3/m for the Wilcoxon test and 2/x for the median test; 
the former is a special case of the Pitnam result. 

After setting up suitable alternative hypotheses and finding the limiting 
distributions of the relevant statistics, we find general formulas for the asymp- 
totic relative efficiencies in the c-sample case for translation alternatives but 
almost arbitrary distributions. These formulas do not in general depend on c. 

Mood {12} and Kruskal [7] derive the limiting distributions of their respective 
statistics in the case of the hypothesis of equal distributions. The methods used 
here to derive the distributions under the alternative hypothesis duplicate their 
results. 
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A rather complete bibliography on nonparametric c-sample tests and a discus- 
sion of the rationale for applying them is given by Wallis and Kruskal [20]. 

The c-sample problem may be expressed formally as follows. Let {X,,;} for 
i= 1,2,---,c,andj = 1,2, --- , m be a set of independent random variables. 
The probability distribution function of X,; is denoted by F;, so that F(x) is 
the probability of the event [X;; S x]. The set of admissible hypotheses desig- 
nates that each F’; belongs to some class of distribution functions 2. The hypothe- 
sis to be tested, say Ko , specifies that F; is an element of 2 for each 7 and that 
furthermore F\(x) = F,(x) = --- = F(z) for all real x. Alternative to Ko is the 
hypothesis that each F; belongs to 2 but that Ky does not hold. To avoid the 
problem of ties, it is assumed throughout that the class Q is the class of continuous 
distribution functions. 

So as to pay particular attention to translation-type alternatives, the class of 
admissible hypotheses will be limited to include only those hypotheses which 
state that F(z) = F(a + «,) for alli = 1, 2, --- , c, for some arbitrary choice 
of F in the class 2 and real numbers «, --- , «-. It is seen immediately that 
specifying all « = 0 yields hypothesis Ko , while if ¢; # ¢; for some pair (i, /) 
then an alternative to Ko is given. 

The H test is based on the statistic 


12 " N+1\ 
( 08 peg mn oti Spreng 
) * N(N + 5 a(R 2 ) 


when R; is the average rank of the members of the ith sample obtained after 


ranking all of the N = > n; observations. The test consists in rejecting Ko 
at a significance level a if H exceeds some predetermined number h, . Kruskal 
[7] proves that if Ko is true, the statistic H has a limiting chi square distribution 
with ¢ — 1 degrees of freedom as all n; — © simultaneously. This provides the 
user of this H test with a large sample approximation of the value of h, for any 
O'<.a.¢ i. 

The Mood-Brown median test is based on the statistic 


N(N -1) <1 ( i) 
(s = 2. mA 
2) M b(N — b) 2 Ni - N 


where N = }> n;, and b = 4(N — 1) when N is odd, and b = 4N when N is 
even, while m; denotes the number of observations in the ith sample which are 
less than the median of all of the observations. Mood proves that whenever Kg is 
true, the statistic M has a limiting chi square distribution with c — 1 degrees 
of freedom as all n; — © simultaneously. The median test is then to reject Ko 
at the level of significance a whenever M exceeds a number m, . Use of the limit- 
ing distribution allows one to determine an approximate value for m, for large 
samples. 

Because both the H test and the median test are consistent against translation 
alternatives, the distributions of H and M will be studied, in following sections, 
assuming a sequence of admissible alternative hypotheses K, for n = 1, 2 


, > 
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The hypothesis K,, specifies, foreachi = 1,2,--- ,c,thatF (x) = F(x + 6;/V/n), 
with F ¢ 2 but not specified further, and for some pair (7, 7) that 6; = 6;. The 
letter n will be used to index a sequence of situations in which K,, is the true 
hypothesis. Limiting probability distributions will then be found as n > «. 
The problem will be so formulated that N will be proportional to n. For large n 
the hypothesis K,, is “‘near’’ Ky, so that this type of limit process provides a 
way of studying the effect of small translations on these tests. 

The notation x;(A) will denote the possibly noncentral chi square distribution 
with degrees of freedom r and noncentral parameter \. Thus x;(A) is the proba- 
bility distribution of the sum of r squares of independent norma! random vari- 
ables whose variances are all unity and whose sum of squared expectations is 
denoted by \. For \ = 0 we see that x7(0) is the ordinary chi square distribution. 
The x;(A) distribution has been studied and used by Fisher [4], Tang [19], and 
Patnaik [15], among others. A partial tabulation of some percentage points of 
x7(d) is given by Fix [5]. 


3. The limiting distribution of 7 under hypothesis K,,. The purpose of this 
section is to prove 

THEoREM 3.1. For each index n assume that ng = 8an, with 8, a positive integer, 
and the truth of hypothesis K,, . If for any real number t, 


lim oo Vn {F(a + t/Vn) — F(x)} dF (x) 


n—2 


exists finite, then, forn — ©, the limiting distribution of the statistic H is x2-.(d"), 
where 


(3) iti > ) ‘| > 8a 


a=! 


Zp> 8; lim 2 Val F (2 + 6s — Ce F(a) | dF (x) s 
\& Vv nm 


/ 


From (1) and definitions (5), (6), and (9) below one can write 


(4) H = 12 /(X 0X 8; + ‘| x Sa | vr (u" -4 % “)T. 


The proof of Theorem 3.1 then quite naturally depends upon showing that 
the random variables 


Vn (U" -43 > a), 


telga 8, 


have a certain joint limiting norma! distribution as n — «. The methods used 
in the proof are mainly adaptations of results of Hoeffding [6] and Lehmann 
[8}. 
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We begin the proof by defining the functions h* by 


c 


a ’ 83 
2) h (yn, midi Oh Sit » Ye) - 2.= 5(ys y Yas 
p=l Sq 
with the convention that 5(ys, ya) = 1 whenever ys < y, and is otherwise 
zero. Throughout this discussion a@ will range over the integers 1, 2,---, c. 
Recalling that n; = sm, we construct for k = 1, 2,--- , n the random vectors 
] Xe = (Xa espe gn» Xa ete ge sy °° * » X1 key 5 
(6) 
X (k—l)sgt+ly» “** y X hs?" tii die G—Beatis °°* 5 Xe ka.) 


and the random variables g*, U*, and U’", defined by 

_ a r yr l r r 

(7) OE: , 668 EQ ee gee 2 NK y, °° Xa.) 
(8, 8 *** 8 )(c!) 


with the summation extending over all indices (j;,--- , j-) in such a manner 
the arguments of a single h* are components of distinct vectors; 
2 > . n 
(8) U* = DeXa Xoo Kad /(”) 
where the summation extends over all indices 1 S 8B; < Bp < --- < B S n; and 
a 1 

(9) OT on ein 

MMe *** Ne jy—1 je=l 


= Tipe x W(X ij, ’ Xj, oo X ej,)> 


Then U’* is recognized as the average of all kinds of h* terms while U is an 
average of only those h* terms in which the arguments of a given Ah“ are each 
elements of a different vector. Setting j* equal to the sum of all h* terms appearing 
in U’* but not in U*, we have 


l ( ") . 
1 a ee a ign 3 
Mm Ne *** Ne (° + f 


( / 

n . \ 
- -¢ Jet = mms +++ me UY & £r. 
MMe °°° Ne | c j 


Adopting a method of proof given by Lehmann ([8], p. 168), we use the in- 
equality (doi a)’ <s ky a. for real numbers a; , and the fact that 


c 2 
vedere . . 8 
Eth (Xi), » Bes eee ean (> ry. 
fm Sa 
Thus we establish that 


E(\/n D*)’ s 4 (> $s 


B=1 8a 
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With the notation W* = ~W/n (U’* — EU"*) and Z* = Yn (U*% — 
E(W* — Z*) = E(Wn D*)’ > 0 


Applying Lemma 7.1 of Hoeffding ({6], p. 305), we have 

Lemma 3.1. If either of the random vectors W = (W', W*,---,W*) or 
Z = (Z',Z, +--+, Z) has a limiting probability distribution as n — «, then the 
other random vector has this same limiting distribution asn > «. 

The next step in the proof is to compare the random vector Z with the 
random vector Y = (Y', Y’,---, Y°), whose components are defined by 


Y = (c iV/n) or wi(X,), with 
V5 (21,22, ‘++ .2j) = Ey" (x; , 22, he's , te Bises i Raees one a on 
— FEy*(X,, X2,--°- , Xe). 


The functions ¥j(2;, %,--- , 2;) are the same as those defined by Hoeffding 
[6] except that they are applied to this special problem. Now Hoeffding ({6}, 
p. 299, (5.13)) has shown that 


—] . 
E(Z*)? = no*(U*) = en - . > °) ar + R%., 


n\"' < @ (” —_ ;) 
—z a ( ; 
a n(") 2. he <4 aj, = Biy5(X1,X 


By expanding binomial coefficients we calculate 


=H 49(6) Mfr -SS 4] Smet 5-0 


j=? = k 


however, aj S 4( > par 83/8.) for all a,j so that Ri. > Oasn — ~. Referring 
to Hoeffding ((6], p. 308, (7.10) and (7.12)), we find that 


E(Y*) = E(Y°Z*) =ca;. 
Substitution yields 


E(Y* — Z*) = E(Y*) + E(Z*)’ — 2E(Z*Y") = R%, 


_1 
op en (") (" ‘) _ é| ai—0 asn— %., 
c =] 


Another application of Lemma 7.1 of Hoeffding ({6], p. 305), produces 

Lemma 3.2. If either of the random vectors Z or Y has a limiting probability 
distribution as n — @, then the other random vector has this same limiting dis- 
tribution asn —> ©. 

It now remains to find the limiting distribution of Y. Each Y* is a sum of 
independent and identically distributed random variables, 


—Dvi(X);  B(") = 
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Also, wi(X,;) s 2( > sat 83/8_) With probability one. Adopting the notation 

for the column vectors s = (s,,-++-, 8)’ a vector of real numbers, and 
on . late anc 

vi = Wi,vi, --- , Wi)’, the characteristic function of Y is expressed as 


fn(s) = E(e“’”) = Elexp {ics';/+/n})", 


because of independence. Taking logarithms, expanding the real and imaginary 
parts of the exponential in finite Taylor series, using the almost sure bounded- 
ness of ¥,(X;), noting that Ely,(X;)} = 0, and finally expanding the logarithm 
in a finite Taylor series, produces the usual type of result that 


log fa(s) = —4c’s'[E(w1)|s + O(n) asn— © 


for any fixed real vector s. From the continuity theorem for characteristic func- 
tions ({2], p. 96), we conclude 

Lemma 3.3. The random vector Y has a limiting normal distribution with E(Y) 
the zero vector and variance-covariance matrix = = lim,.«C E(Ww). 

Adopting the notation 


y ees S| Fexe) - / F(z) arta) | => | Fa(%) a | F(z) asta) |, 


8 88 j=l 


. 
we can recognize ¥1(X,) = (1 c) > pn (88/8e)Aga. A lengthy computation and 


an application of the Lebesgue bounded convergence theorem, in view of the 
boundedness of each F; and lim,... F(x) = F(x), yields the result that 


> = lim CE (ivi) = Ke | : (> *) (ss > 2 1). 
n-2 8a \jml 58 j=l 88 

Combining the previous three lemmas produces 

Lema 3.4. If for each index n the hypothesis K,, is valid and W“ denotes the 
random variable ~/n (U’* — EU"), then the random vector W = (W',W’,--- , W°) 
has a limiting normal distribution with zero mean vector and variance-covar- 
lance matrix 2. 

Recalling W* = +/n (U’* — E(U’*)) and (4), and letting 


‘= Vn Ee -4> “|, 


i=l 8a 


we write H as 


H = [12/(z MX 8; + | > 8.(W* + m*)’. 
i=l i=| nv a=! 


Now H will have the same limiting distribution as 


H* = }2/(x «) | > s(W* + m’)’, 
t=] awl 
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but because > Naka = (> Na)( > Na + 1), we have + s.W* = O(n”) as 
n-—+ ©, So, except for terms of higher order, 
c a7] ¢ emt 2 
/(z a) | Ea Mae (sabe + *) (W* + m")(W* + m’). 
i=l awl fx 8 
We recognize the matrix of the quadratic form H* as the inverse of the limiting 
variance-covariance matrix of the random variables W' . Ww’, TPSE gs 
Lemma 3.5. If the vector x has a normal distribution, with mean vector yw and 
non-singular variance-covariance matrix A, then the quadratic form x’A"'x has a 
x, (X) distribution, with } = u’A ‘yu and r the rank of A. 
A proof of this lemma is given by Rao ({17], p. 57). We now calculate 


lim m* = > * lim vn | | F (2 4 % = ~) ee Fa) | dF(z), 


na Bel 8a n--o vV n 


and combine Lemmas 3.4 and 3.5 with a theorem of Mann and Wald ([{10}, 
p. 223) to complete the proof of Theorem 3.1. 
In many instances \” can easily be computed with the aid of 


LemMA 3.6. If the distribution function F possesses a continuous derivative 


F’ except at most on a set S where | dF (x) = 0, and if there exists a function g 


s 


which bounds the difference quotient \[F(xa + 6) — F(x)\/@| S g(x) for which 
/ g(x) dF (x) < ~, then 


+-<¢ 


aw +00 
[F(a + 6//n) — F(x)| dF(x) = 0 F’ (x) dF (2). 


«© 


lim Va [ 

n-* a 

This lemma is proved by a direct application of the Lebesgue bounded con- 

vergence theorem and the definition of the derivative. In the event that the 
conditions of Lemma 3.6 are satisfied, then 


f > 


+00 \2 ¢ ie 
\” = 124 [ F’ (x) dF (x) ¥ x 8a(Oa — 8)’, 


o a=! 
§ = ee Oa / = Sas 
awl 


awl 


4 


4. The limiting distribution of M@ under hypothesis K, . The purpose of this 
section is to derive the limiting distribution of the statistic M as n — «. The 
result is stated in 

TuHeorem 4.1. Assume for each index n = 1,2, «++ the validity of hypothesis K,, , 


that F has a continuous derivative F’ at its median a, and that n; = sm, for each 


i = 1, 2,---, ¢, with 8; a positive integer. With these assumptions the limiting 


distribution of M is x2-\(d™) with 


(10) ” = 4{F’'(a)? > 3,(0, — 8), 0 = >d 0, / 8; 
1 i=! i=] 


i 
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The proof of the theorem is a generalization of a type of proof sketched by 
Mood [13] in his discussion of the two-sample problems. Because the two cases 
N odd and N even require slight differences in exposition, only the proof for N 
odd will be given here. A similar proof for N even could readily be constructed. 
In this case N odd, 


4 - mi 1\ 
M = - cree eee q ye _ i) . 
iw 2" (* + oy 
Defining the random variables v; = +/n;[(mj/n;) — 4], permits M to be written 


4 . 2 Vn; nj 
( l > ---—-—-—-~— dient : —. . 
9) + <9 oe yaar ah 


Provided that we can demonstrate that v; has a limiting distribution, since 
Vn;/N, n;/N*, and 1/N all converge to zero as n — ©, M will have the same 
limiting distribution as the statistic 4~ v; . The first part of proof consists in 
proving 

Lemma 4.1. Assuming the hypothesis of Theorem 4.1, the limiting distribution of 
the vector (v , ++ , ves) is normal wiih E(v;) = F’(a)~/s;(0; — 6) and covari- 
ance matrix A, given by 


. 4 o—- ss 
_ -(4 («84 + Vara); 1,j3=1,2,-°-,e—1, 


where 6;; is the usual Kronecker delta. 
Let ri, +--+ , 7. be a set of independent random variables each with a uniform 
probability distribution on the unit interval and let 


= Vai PY a] 4 Se, 5 1- 


/ ° 
j V nN; 


The difference v, — v,; tends to zero in probability and so, by a well known theorem 
({2], p. 299) the vectors (vi, +++, Ve) and (v;, --+ , t%) possess the same limiting 
distribution if they have one at all. Because the v; are discrete while the v; are 
continuous random variables, it is easier to examine the limiting distribution of 
Ce A 

Denoting by Z the median of all the samples combined, the probability of 
the joint event m,; = a, and m. = a, and --- and m,.., = a,., and 2, S Z S zw is 


Pim, =a, °°: M141, 2352 2] 


ye 
> | Oe Fal T oy (Fi(z)}7{1 — Fyl2)}" dz 
i=1 zi l —_ F(z) 1 


a; 


for > a; = 4(N — 1) with a; a nonnegative integer, and is zero otherwise. 
Writing m; = m,; + r; and square brackets to indicate the “largest integer 
contained in,” we see that the joint probability density function of the random 
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. / / r . 
variables m;, ++: ,M-41,Z is 


, / ‘ “ 7 5 n 7 m nj—-([m 
g(m,,**+,M.4,2) = a fl ere) ( 1) tr, (2) {1 — Fy(z)}re is i) 
i=l =o j=l 
for ; [m‘| = 4(N — 1), and otherwise zero. With the transformation 
=~Vn(Z—a); v5 = Vnjl(m;/n;) — 4}, 7 =1,2 
the probability density of (v; , --- , ve-1, w) becomes 


, 5 “ (d;](nin.) ae “( w \ 
h(y, +++, ve1,W) = . —_——— F;\(a+ — 
’ rs ae —F (a + w/Vn) Val 


«I vm (11) \F (« oT >} Fi (« + ra 


where dj; = 4n; + v;+/n; and square brackets indicate the “largest integer 
contained in.’ 

Noting that >of Vsw; = o(1) as n — ©, employing Stirling’s formula for 
log n!, and using series expansions and the continuity of F’ at x = a, we compute 


; ' 1 \' 2 "V/s 
lim A(vy, +++, er, Ww) = ( =] Vv : 
no V/ 24 2. 


x exp{ 42 Ale — F'(a)V/5;(6; ~ /\ 


) 


2F’(a)V/s 


x 
V Qe 


7 ‘ . 

exp {—2(F (a))'s(@ — w)°}, 

where s = 8 + 8& +--+: + s,. Letting A, denote the variance-covariance 
. ’ / > 

matrix of (v1, +++ , Ue), we find 


8.)( 8.65; + V/ 8\8;)}, ‘/9 Lm **,¢@— f, 


Applying a theorem of Scheffé [18] yields the result that the limiting dis- 
tribution of (v; , «++ , ve, w) is the foregoing normal distribution. Integrating 
out the variable w, we obtain the desired limiting probability distribution of 
(v; i ied ve 1) and hence of (v; , «++ , Ue), Which proves Lemma 4.1. 

Earlier in this section it was remarked that if (vy, --- , v-.) has a limiting 
distribution, then H has the same limiting distribution as 


e—l e—l 


4D = 92 (a + si) + V 858 Vj %? + 1. 
i 7=1 


k=l 


\7 


However, 7 tends to zero in probability, since >> +s; = o(1) is satisfied with 
probability one. We recognize then that, except for the term 7, >> v; is equal 
to the quadratic form in the limiting distribution of (vy, , --- , v4), provided 
that the means are shifted to zero. As in Section 3, we employ Lemma 3.5 to 
obtain the main Theorem 4.1 of this section. 
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5. Asymptotic relative efficiency. ‘he concept of asymptotic relative efficiency 
of one consistent test with respect to another is due to Pitman [16]. An appli- 
cation and account of this method of comparing consistent tests is presented 
by Noether ({14], p. 241). Briefly, the idea of asymptotic relative efficiency is 
to choose a sequence of alternative hypotheses which vary with the sample 
sizes in such a manner that the powers of the two tests for this sequence of 
alternatives have a common limit less than one. The comparison of the two 
tests is then made on a sample size basis. 

To be more definite, suppose that two consistent tests 7’ and 7” require N 
and N’ observations, respectively, to attain the power 6 at level of significance 
a for testing the hypothesis Ko against hypothesis K,,. The difference in the 
sample sizes N and N’ results from the fact that we demand that the tests 
yield a common power for a given alternative K,. The asymptotic relative 
efficiency of T’ with respect to T is defined to be 

lim N/N’ = lim n/n’ = ep,r(a, B, Ko, {Kn}). 
N-w no 

The asymptotic relative efficiency of the median test with respect to the H 

test is stated in 


THEeoreM 5.1. Jf n; = sm and if the distribution function F has the two proper- 
lies, 


(i) F is continuous at its median, and 


(ii) lim,.« vn | 


+0 


[F(x + 0/\/n) — F(x)| dF(x) exists, 


then the asymptotic relative efficiency of the median test with respect to the H test 
for testing the hypothesis Ky against K,, is 
e 2 e 
(X «) - [F’(a)? Do 8:(0; — 6)’ 
j=1 i=l 


c c +00 a 3° 
i> & {> s; lim / Vn |F (x +85) = F(a) | aF(2)\ 
a=l \ il no Jo Y n ) 

To prove Theorem 5.1, let n’ and n index the sample sizes for the H test and 
the median test, respectively. The alternative hypothesis K, states that 
F(z) = Fla + 6; n/n) and so is characterized by the numbers 6:/V/n. If the 
level of significance is fixed at a and the limiting power fixed at 8, then, since 
from Theorems 3.1 and 4.1 H has a limiting x3_,(\”) distribution and M has a 
limiting x2-1(A™) distribution under K,, , we must have \” = \”™ to achieve the 
same limiting power for the two tests. To have the same alternatives for each 
test we must have 6,/+/n = 6;/+/n’. The substitution 6; =, 0; Vn'/n in (10) 
along with the requirement \” = \™“ (to guarantee equal power) yields formula 
(11), which proves Theorem 5.1. 

Coro.uary 5.1. If in addition to the hypothesis of Theorem 6.1, the hypothesis 
of Lemma 3.6 is assumed, then 


+0 2 
€u.z = [ra / [ F'(z) ar (a) | . 
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Here €y,~ does not depend upon a, B, (#1, «++ , %) 
only. 


The comparison of the H test with respect to the ordinary analysis of vari- 
ance § test is contained in 


THEOREM 5.2. If the distribution function F satisfies the conditions of Lemma 


, or c, but is a function of F 


ed +00 2 
3.6 and Tal x dF(x) — il x ara) | =o, exists, then 


+0 2 
¢z,7 = 120 | F' (zx) a) ; 


The classical § statistic in this instance is defined by 


a te. = 


rat? 


Bi ie ees 


i=l joel 


Now Fisher [4] and Tang [19] have shown that if F(x) is the normal distribu- 
tion function, men under hypothesis K,, the statistic ¢ has a limiting x? (x5) 
distribution with »* = >i s{(0; — 6)/or|’. However, it is a well known result 
of the weak Law of iia Numbers that [1/(n — 1)] > a1 (4;; — xm.) > op in 
probability as n —>» ©. Also the Lindeberg-Levy central limit theorem shows 
that VY nia;.— E(x; .)|/or has a limiting N(0, 1) distribution. Application of the 
Mann-Wald theorem used previously gives the result that under hypothesis 
K,, the statistic $ has a limiting x2 (r¥) distribution whenever F satisfies the 
hypothesis of Theorem 5.2. A calculation similar to that for the proof of Theorem 
5.1 completes the proof of Theorem 5.2. 

Theorems 5.1 and 5.2 show that, depending upon F, ¢w4,” can be = 1, similarly 
for eng and ew F = €u.néengF - In the event that F is some normal distribution 
function, then ew,” = 2/3, en.¢ = 3/m,and ey ¢ = 2/x. When F is the uniform 
distribution function on the unit interval, F(x) = xif0 S 2 S 1, theney » = 1/3, 
én, = 1, and ew,g¢ = 1/3. 


6. Power functions. The power of a test for a given simple alternative hypothe- 
sis is the probability that the test will reject the hypothesis tested when the 
given alternative is true. In terms of this power definition, the power function is 
defined on the class of alternative hypotheses. 

As we have seen in Sections 3 and 4, both the H and M statistics have limiting 
noncentral chi square distributions when the alternatives K,, are true for each n. 
In the event that Lemma 3.6 is satisfied, the nencentral parameter in each of 
these limiting distributions is a function of 6; , , 9. only through the variable 
a s,(0; — 6)’. In fact 


+o 2 ce c 
NY = i2| f F'(2) ara) | D 8(0; — 6), -™ = 4[F’(a)]? 2 8,(0; — 8)”. 
« i=l tool 
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For particular choices of F the power function of each of these tests could be 
considered as a function of > s,(0; — 6)*. This type of power function approxi- 
mation is discussed in Cochran’s paper on the chi square test for goodness of 
fit [1]. 

The tables of Fix [5] may be employed to find approximate values for these 
power functions. The procedure would be as follows. Suppose that F is the 
uniform distribution function on the unit interval, then A” = 12>° s,(@; — 6)’ 
and \™ = 4>° s,(0; — 6). If the approximate power is desired for the test us- 
ing n! = sn° observations in the ith sample, when alternative F;(x) = F(x + «;) 
fori = 1,2, --- , ¢, is true, set ¢; = 0;/+/n and compute 


¢ c 
MM = 1220 nile -—, AM = 4D nile, — 9’. 
i=] i=l 

For the given level of significance and c — 1 degrees of freedom, enter the Fix 
tables and find the approximate powers for these two tests at the given alterna- 
tive. Because of the limited extent of the Fix tables, the power can be found only 
to the first decimal place without some sort of interpolation. In most instances, 
however, this accuracy should be sufficient, as it is not known how close these 
approximations are to the true power. 


7. Acknowledgement. The author wishes to express his thanks to Professor 
Erich L. Lehmann for proposing this investigation and for sustained interest and 
helpful suggestions during its progress. 
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MULTIDIMENSIONAL STOCHASTIC APPROXIMATION METHODS! 


By Juuius R. Buu 
University of California, Berkeley and Indiana University 


1. Summary. Multidimensional stochastic approximation schemes are pre- 
sented, and conditions are given for these schemes to converge a.s. (almost 
surely) to the solutions of k stochastic equations in k unknowns and to the point 
where a regression function in k variables achieves its maximum. 


2. Introduction. Let H(y | x) be a family of distribution functions depending 
upon a real parameter z and let M(x) = / y dH (y | x) be the regression func- 


tion corresponding to the family //(y | x). Robbins and Monro [1] define a sto- 
chastic approximation method to solve the equation M(x) = a, where a is a 
specified constant. Their method is such that the approximating random vari- 
ables converge in probability to 6, where @ is a root of the equation M(x) = a. 
These results are generalized by Wolfowitz [2]. Kiefer and Wolfowitz [3] define 
a stochastic approximation scheme which converges in probability to 6, where 6 
is the point at which M(x) achieves a maximum. Finally, it is shown [4] that in 
fact, in both of the situations mentioned above, the approximating sequence of 
random variables converges a.s. to @. 

The object of this paper is to extend these results to several dimensions. More 
precisely we consider the following two problems. 

(A) Let {Y!?...4},-°:, {YM?....,} be & families of random variables with 
corresponding families of distribution functions {F$))....4},°°-, (FSP...) 


** els 


each depending on k real variables (2, --- , 2%). Let M (a, , e+ 2m) = 


x 
/ y OPE ies fori = 1,---, k, be the corresponding regression functions. 
2 


Then, if a ,--+ , a are k specified numbers, it is desired to find a stochastic 
approximation method such that the sequence of approximating random vectors 
converges a.s. to a solution of the equation 


( 
M'" (a, °** , Xe) = a, 


Here it is assumed that the distributions F‘” and the regression functions M“' 
are unknown; however, it is possible to make an observation on the random 
variable Y;;’....2, for i = 1,--- ,k,and any choice of real numbers (x; , --~ , x). 

(B) Let {¥z,......} be a family of random variables, F,,......, be the corre- 
sponding distribution fnctions, and M(x, --- , 2) the corresponding regres- 
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sion function. Subject to the assumptions of (A), it is desired to estimate that 
set of numbers (6; , --- , 6.) for which the function M achieves its maximum. 

The approximating sequences defined in this paper are straightforward general- 
izations of the sequences defined in [1] and [3]. The methods of proof used here 
were strongly motivated by the methods used in [2] and [3]. 


3. A theorem on almost sure convergence. The following theorem is an im- 
mediate consequence of the martingale convergence theorem of Doob [5]. 
Tueorem. Let X,, be a sequence of random variables satisfying 


(i) sup, E{|X,|} < o, 
(ii) S09, E{{E{Xau. — Xn| Xi, +++, Xa} ]} < @. 


Then X,, converges a.s. to a random variable. 

As usual, we define X* by X* = 4[X + |X|]. We immediately obtain the 
following 

Corouiary. Let X,, be a sequence of integrable random variables which satisfy 
condition (ii) of the theorem and are bounded below uniformly in n. Then X,, con- 
verges a.s. lo a random variable. 

Proor. Let Y, = X, — a, where ais chosen so that Y, = 0 for all n. Then 


n—1 


E{\Yal} = E(Y.) = E(¥i} +2 EtYin — Yi} S EUYi} 


j=l 


n—l 
+ > EXEL Xj ox X; | Xi, oe X;}"]}. 


j=l 


Hence the theorem applies to the sequence Y, and consequently to the sequence 
Ae: 


4. Convergent sequences of random vectors. Let /, be a real k-dimensional 
vector space spanned by the orthogonal unit vectors u,, --- , wu. If x and y are 
two vectors in E, , we denote their inner product by (x, y) and their norms by 

x| and |\y||, respectively. Suppose that to each x ¢ FE, corresponds a random 
vector Y, ¢ E, . Denote by M(x) the vector representing the conditional expecta- 
tion of Y, when z is fixed. 

Let now f(x) be a real-valued function defined on F, and possessing continuous 
partial derivatives of the first and second order. The vector of first partial deriva- 
tives will be denoted by D(x) and the matrix of second partial derivatives by 


A(x). That is 
D(x) = (2) lz, Als) -( ae \\2 
OX; Ox, 92; 


Then, for any real number a, we have by Taylor’s theorem 


fla + aY,) = f(x) + a(D(z), Y2) + 4a(Y., A(z + OaY,)Y,), 





STOCHASTIC APPROXIMATION 739 


where @ is a real number with 0 < 6 < 1. Consequently we may take expecta- 
tions on both sides to obtain 


(4.1) Ef{fix+aY,)} = f(x) + a(D(x), M(zx)) + 4a E{(Y,, A(x + 0aY,)Y,)}. 


Let now {a,} be a sequence of positive numbers and consider the following 
sequence of recursively defined random vectors 


(4.2) Xue = X, + Ge¥o, 


where X, is chosen arbitrarily and where Y, has the distribution of Y, when X, 
yields the observation x. The object of this section is to set down conditions 
under which X,, converges a.s. to zero. 


To simplify writing we shall employ the following notation throughout: 
Zz = f(x), U(r) = (D(r), M(z2)), Va(z) = E{(Y,, A(x + OaY,)Y;)}. 


When we substitute the random variables X, for x and the numbers a,, for a, 
the corresponding random variable: will be denoted by Z,, U,, and V,. We 
shall assume throughout that M(0) = 0. 

Consider now the following set A of conditions: 


* 
2 
> a, = ©, Qn < ©; 
n=l 


Z, 2 0; 


sup U(x) < 0 for every « > 0; 


inf |Z, — Z| > 0 for every « > 0; 


z 
Viz) SV < & for every number a. 
Then we have 
THeoreM |. /f the sequence a,, satisfies A, and if there exists a real-valued func- 
tion f(x) with continuous first and second partial derivatives satisfying A, --+ As, 
then the sequence \x,} defined by (4.2) converges a.s. to zero. 
Proor. From (4.1) we obtain 


(4.3) E\Zasa | Zi, oo » Bal - Zn + a,h£{U, | “Zi; e0°,Z,} 
an 


+ 9" HiVa| Zi, °** Zn} Site 


2 
Since M(0) = 0, we have, by virtue of conditions A, 
E\{U,|21,°**, Zn} SO as., E{\Van\Z1,°°*, Zn} SV ass., 
both for all n. Hence 


(4.4) E\Zasi — Zn| Zi, °-*, Zn} S&S fan V as. 
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We may assume V to be nonnegative. Using this fact together with conditions 
A; and A, , we may apply the corollary of Section 3 to obtain 


(4.5) P{Z,, converges} = 1. 


Taking expectations on both sides of (4.3) and iterating, we have 


E{Zau} = Zi + Las E(Uy) + Do hajEIV;}. 


j=l j=l 
From what has been said above it follows that 
E\Z,} = 0, E{U,} s 0, Etv.: s V, 


- - s ° ° 2 e,° 
Since V is nonnegative and the series >? a, converges, the nonpositive term 
series >>? a,£{U,} also converges. By virtue of the fact that )? a, = » we 
have 
lim sup L{U,} = 0, lim inf E{| U, |} = 0. 
now now 
Let {mj} be an infinite sequence of integers such that lim,.,. EZ{|U,,|} = 0. 
Then U,, converges to zero in probability and there exists a further subsequence 
say {U,,,} such that 
P{lim U,, = 0} = 1. 
kow 
From condition A, it follows that P{limy.. Xn, = 0} = 1. Since Z,, is a con- 
tinuous function of X,, it follows from (4.5) that 
(4.6) P{lim Z, = Z} = 1. 
nee 

Now consider a sample sequence |X, | such that for the corresponding sequence 
\Z,} we have lim,.. Z, = Zo. From condition A, it is clear that for such a 
sequence we must have lim,... X, = 0. Hence (4.6) gives the desired result. 

We may obtain the same result by assuming a slightly different set of condi- 
tions: A’, changing A; and A, to: 


A;: There exists e > O such that sup V,(z) S V < = for every number a; 


O«< z\\|<e 


As: There exists 4 > 0, with A > 4a, for vach n, such that sup [U(z) 


isiiz 


+d Vi(x)| < 0 for every 6 > 0 and every number a. 


Then we have 
THeoreM 2. I/f the sequence \a,} satisfies condition A, and if there exists a real- 
valued function f(x) with continuous first and second partial derivatives satisfying 
As, Ag, Ay, and Ag, then the sequence {X,,} defined by (4.2) converges a.s. to zero. 
The proof of this theorem follows very closely that of Theorem 1, and so is 
omitted, 
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5. Examples. In this section we illustrate the results of the previous section 
by a few simple examples. Assume that the problem is as described in (A) of 
Section 2. Then to each x ¢ E, corresponds to a random vector Y, ¢ Ey with co- 
ordinates Y{° for i = 1, --- , k. Let M(x) be the vector of conditional expecta- 
tions, when z is given. Without loss of generality we assume that a; = 6; = 0 for 
t= ],---,k. 

EXamp te I. Let B be a negative definite k x k matrix and assume 


(i) for some p > 0, |x) S p implies M(x) = Bz; 

(ii) |!z|/| > p implies M(x) = M([p/|\x'!Jx); 

(iii) 02°” < o° < @ for each x ¢ E,, and each i = 1, --- , k, where o2” is 
the variance of the ith component of Y,. 

Under these conditions it is clear that both ||M(zx)|! and {|| Y,||"} are bounded 
uniformly in z. Now define f(z) by f(x) = ||z/\*. If we choose the sequence {a,} 
to satisfy condition A; , we can easily verify that the remainder of condition A 
is satisfied. To do this we note that A» and A, are obviously satisfied from the 
choice of f(x). Further we have 


( 22, Bx) 
~ | 2fp/Ilall] (x, Bz) 


U(a) 


V(x) = 2E{\Y,!|"} for every number a. 


’ 


From the boundedness of {|| Y,!|"} it is clear that Ag if also satisfied. It remains 
to check A;. To do this we recall that for every negative definite matrix B 
there exists a positive number b such that (xz, Br) < —b!|z\|*. Thus if ¢ is any 
positive number with 0 < ¢« S p, we have 

(2, Bx) Ss —bée ifes l|xl| < p, [o/\| 2 ||] (2, Br) s —bp’ if l|2\| > p. 


Hence A; is also satisfied and Theorem 1 applies. 
EXAMPLE II. Consider a negative definite matrix B and assume 


(i) M(x) = Bz; 
(ii) there exist « > 0 and C > 0 such that || x || S e implies {|| Y, ||'} < C 


(iii) there exists p > 0 such that || x || > «€ implies 
(x, Br) + pE\\! Ys \\"} <0. 


With f(x) again defined by f(x) = || x ||", we have 


/ 


U(z) = Az, Bz), V(x) = 2E{\| Y.|\"} for all a. 


Hence it is clear that if we choose the sequence {a,j} to satisfy condition A; , we 
need only verify A; , since the other conditions follow immediately. To do this, 
assume first that || 2 || S ¢« as determined by assumption (ii) of this example, 


and let \ be any positive number. Let b > O be such that (2, Br) S —b || x ||". 
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Then we have 
U(r) + »AV* (x) = flr, Br) + Ff\| Y,|!74] 

S 2(z, Bz) + dC] S$ 2—b || x ||? + ACh. 
Hence it is clear that if 0 < 6 S || z!| S €, we can choose ), such that 


U(x) + V*(z) S 2[—bs* + ALC] < 0, 


and if || 2 || > «, choose 0 < 2 < p, where p is determined by assumption (iii) 

of the example. Then 

[U(x) + dV" (x)) _ (? 
9 


= i) (x, Br) + Na ((xz, Br) + pE{\| Y,!*}] 
p p 


(° - ") (x, Br) S - & he ») be < 0. 
p p 


Hence by choosing \ = min (A; , Ax) we satisfy condition A, and Theorem 2 ap- 
plies. 


6. The maximum of a regression function in several variables. In this section 
we turn to problem (B) of Section 2. Assume once more that z is a variable point 
in E, and to each z corresponds a random variable Y,, with corresponding re- 
gression function M(x). Assume, without loss of generality, that M(x) has a 
unique maximum at z = 0. The problem becomes one of constructing a sequence 
{X,} of random vectors with the property 

P{lim X, = 0} = 1. 
n-0@ 
Let {a,} and {c,} be two infinite sequences of positive numbers satisfying condi- 
tions B: 


Bi: lime, = 0, 


new 


> 


Ba: > 3) < wo, 
1 


n= Cn 


Suppose now « ¢ £, and let c be a positive number. Let u, --- , uz be the or- 
thonorma! set spanning /, . We construct a random vector Y,,. by taking k + 1 
independent observations on the random variables Y,, Yeseu,, °°" » Ye+eu, and 
defining 

Yue - (VY s+eu; = Ys), Po (Y s+euy - Y,)). 


We proceed to construct a recursive sequence of random vectors by choosing X; 
arbitrarily and defining 


(6.1) Xazi = Xn + On Yn/Ca, 


where Y, has the distribution of Y,,.. when X, yields the observation x. The 
intuitive reason for (6.1) is fairly clear, since Y,/c, is the vector in the direction 
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of the maximum slope of the plane determined by the k + 1 vectors 
(Xe, YVx.), (Xa 4 Catti, YVagtenus)s °°°s (Xa 4 Cotte, YVanseus)- 


We denote the vector of first partial derivatives and the matrix of second partial 
derivatives of M(x) by D(x) and A(z), respectively. We write D, for D(X,) and 
A, for A(X,), and denote by A,, the vector whose coordinates are the diagonal 
entries of A, , by A, the vector E{Y, | X,}, and by o? the variance of Y, . With- 
out loss of generality we assume that M(0) = 0 so that M(x) s 0 for all x. Then 
we have 

THeoreM 3. Suppose the sequences {a,} and {c,} satisfy conditions B and 
further that 


(i) M(x) is continuous with continuous first and second derivatives; 


ee 2 2 
(ii) og Sao < @; 


(iii) for every positive number ¢ there exists a positive number p(e) such that 


|x2|| 2 € implies M(x) S —p(e) and D(x) || = ple). 


(iv) The second partial derivatives d°M (x)/dx,0x; are bounded for ij =1,---,k 


Then the sequence |X,,} defined by (6.1) converges a.s. to zero. 
Proor. Expanding —M(X,4;) we obtain, withO Ss @ < 1, 
Ay 


2 
—M(Xn41) = —M(X,) - ~ ae =  - A(x. + 6— ry.) Vad 
Po Cc 


2c’, 


Taking conditional expectation for given X, we have 
2 
: , . , An an 
E{—M(Xuny) | Xn} = —M(X,) — — (Da, An) — =G 
Cn de 


‘ 
an 


( 
an 


\ 
Ey (Yn, A (x. + 0- v.) Y,) | Xnp a.s. 


9 , a 
Since A(x) is a bounded matrix and a; is bounded, we have 


[ELY., A(x, + o— Ys) Y,) Xp | s kK, iI An (|? + K:, 


where K, and K» are suitably chosen positive constants. By virtue of the hy- 
pothesis we obtain 


An = ¢n\Dp, Us) + fen(us, A(Xn + O%Cqus)ui), i= 1,-+-,k, 
where AS” is the ith component of A, and0 < 6 < 1 fori = 1,--+,k. Hence 
(Da, Mn) = Cn || Da |\* + 4c5(Dzn, An) 
\| Me ||? = ey || Da ||? + cD, An) + fe || Ap |i”. 
Now by hypothesis, || A, || is bounded, say || A, |’ < K,. Then 


|| (Dn, An) |? S Ks || Da ||’. 
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After some computation we find 
E{—M(Xas:)|X,.} S —M(X,) — an{|| Dp |I"1 — Kia.) 
— || Dall Ki lhen — 4Kidatal} + 4KiKya2c%, + $Kia5/ch @.8., 


where n is chosen so large that [1 — 4K,a,] and [c, — Ky,a,c,| are both nonnega- 
tive. 


Let A, be a sequence of random variables defined by 


1 if || D, || = 1, 
hod celia 


10 otherwise. 


We note that for n sufficiently large we have 


(6.2) An} || . f {l o- 4 K,a,| _ An || D, | K} *[4Cn — 4K 1AnCn\} & 0. 


Hence, for such n we obtain 
EB —M(X n41) | Xn} s —M(X,) + an€r(1 - en An) 1] D, || K;” | 5 + Kya, | 
+ }KiK;aic, + 4Ka;,/c’, a.s. 


This inequality clearly is still preserved if we take conditional expectations with 
respect to M(X,,) on both sides. But now we note that 


2 8. ajejK3"[} — 4Kya,| E{(1 — dq) || Da || | M(X,)} converges a.s.; 
. $KiKyaic’ and >>! 4K.a'/ci both converge. 


These follow from conditions B and the definitions of A, . Hence, we may again 
apply the corollary of Section 3 to obtain that M(X,,) converges a.s. to a random 
variable. Now we note that >> fa, diverges to + # and that M(X,) < 0. Hence 
the series 
Ze a; E{\| D; iI’ [1 — $Kia,) — A; || D; || Kye; — >Kya;¢;} 
j=l 
converges. This, together with (6.2), insures the existence of a subsequence D,, 
with the property P{lim,,.D,, = 0} = 1. Hence X,, converges 4.8. to zero. 
Since M(x) is continuous and M(0) = 0, we have P}lim,,.M(X,) = 0} = 1, 
which implies the desired result. 
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CERTAIN INEQUALITIES IN INFORMATION THEORY AND THE 
CRAMER-RAO INEQUALITY 


By S. KuLiBack 
The George Washington University 


1. Summary and Introduction. The Cramér-Rao inequality provides, under 
certain regularity conditions, a lower bound for the variance of an estimator [7], 
[15]. Various generalizations, extensions and improvements in the bound have 
been made, by Barankin [1], [2], Bhattacharyya [3], Chapman and Robbins [5], 
Fraser and Guttman [11], Kiefer [12], and Wolfowitz [16], among others. 

Further considerations of certain inequality properties of a measure of informa- 
tion, discussed by Kullback and Leibler [14], yields a greater lower bound for 
the information measure (formula (4.11)), and leads to a result which may be 
considered a generalization of the Cramér-Rao inequality, the latter following as 
a special case. The results are used to define discrimination efficiency and estima- 
tion efficiency at a point in parameter space. 


2. The first inequality. We use the notation and terminology of [14]. Consider 
the measurable transformations 7'y of the probability spaces (9, 8, u;) onto the 
probability spaces (Y, 3, v{”’), and suppose forG ¢ 3 that v{"’(@) = u,(T'x'G) for 
t= 1or2. 

THeoreM 2.1. Let the Ty be such that 


(2.1) lim »i"’(G) = »(Q), 


N-- 0 


Then 


(2.2) 1(1:2;z) = lim inf 7y(1:2;y) = I'(1:2; y); 


N21 


(2.2’) J(1,2;2) = lim inf Jy(1, 2;y) 2 J’(, 2; y). 


N-+-@ 
Proor. We first derive a result which is similar to a lemma used by Doob [8}. 
Using Lemma 3.2 of [14], we have 
v1’ (G;) 


vi" (G;)’ 


(2.3) 'y(1:2; y) = Dov" (G,) log 


where the sum is taken over any set of pairwise disjunct G; such that U,G; = y. 
Accordingly, 


ale , (G 
(2.4) lim inf Jy(1:2; y) >n(G;) og - ~ ; 
N—-0 v2(Gj) 
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and therefore 


(2.5) lim inf vi2;y= '2;y), 


since the right member of (2.5) is the l.u.b. of the right member of (2.4). In 
conjunction with Theorem 4.1 and paragraph 5 of [14] and (2.5), the inequalities 
(2.2) and (2.2’) follow. These are used herein only in Section 3. 


3. An example. Consider N independent observations from the binomial 
distributions B(p; , q:), fori = 1, or 2, which as N — © approach as limits the 
Poisson exponential distributions with means m; = Np, , fori = 1 or 2. It may 
be verified readily that 


(3. w(1 2: = : 
3.1) T'y(1:2; y) Low 


4 ( 
= N (p log ™ + q log n) ’ 
Pe q2 
: m' e 7 mie “se my 
(3.2) I'(1:2;y) = 7 , log Se (mz — m) + m log —. 
y! m2 me 


ms € 


Using the well known inequality 2x, log (z,/z.) 2 2; — x2, and m; = Np; for 
i = | or 2, it is found that 


(3.3) Np, log Pi + Na log 3 m, log — + v(1 _ 7) log 
Po q2 mM» N 


1 — m,/N 
1 — m/N 
my _{ me my my, 
= m, log a v( - — | = m, log + (mz. — m), 
m2 N N Me 
or 


(3.4) lim inf Jy(1:2; y) 2 J’(1:2;y). 


As a matter of fact, for this particular case, as may be readily seen from the 
first two members of (3.3), 
(3.5) lim I'y(1:2; y) = I’(1:2;y). 
N-@ 


4. The second inequality. Suppose g,(y), go(y), and g*(y) are densities satis- 
fying the conditions of paragraph 4 of [14]. Then using Lemma 3.1 of [14], 


( 2( 
(4.1) Jaw log 2 Y ayy) + [aw log 2" dy(y) 
g2(y) g*(y) 


= Jaw log nity) dy(y) 2 0, 
g°(y) 


Jaw log ny) dy(y) 2 Jaw log g*y) dy(y). 
g2(y) g2(y) 
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In particular, let us take, for real t, 
ty 
* - e ga(y) ot ty 
(43) oy) = Gee, Milt) = fe" oxy) dv), 


so that (4.2) becomes 


(4.4) I’ (1:2; y) = at — log M(t), a = E,(y), 
with equality if and only if 


ty 
4.5) aly) = g*(y) TAO! [y(y)] 

To investigate further the right member of (4.4) we will use the notation, and, 
in particular, the results of paragraphs 4 and 6 of Chernoff [6]. Clearly 


(4.6) I'(1:2; y) = sup (at — log M,(t)) =—log m,(a) , 
t 


where m,(a) = inf, e~“'M,(t). Note that for the value of ¢ satisfying a = N2(t(a)) / 
M.(t(a)), we have 
* 
—log m,(a) = [ow log 2 
galy 
From this, or the results of Lemma 7 of Chernoff [6], it follows that — log m2(a) 
is a convex function of a. Limiting ourselves to statistics y for which E,(y) and 
Var.(y) are finite, the results of Chernoff [6] may also be derived for the case 
a2 E,ly). 
We can write 


dy(y) 2 0. 


(4.7) log m2(a) = log m(E2(y)) + (a — E2(y)) ous m2(a) | 


da | aE (y) 


(a — E,(y))’ a “ 
t 2! ( db }’ 


where b is between a and E,(y). But as Chernoff [6] has shown, 


da 


dt(a) hea db) 
da |onr,y) Vary)’ db  Vare(y), 


where Vars(y) is the variance of y for the distribution defined by 
(4.9) ge(y) = €"ga(y) / Ma(t(b)). 

From (4.6), (4.7), and (4.8) it follows that 
(4.10) I’'(1:2; y) 2 (Eily) — Ex(y))? / 2 Vare(y), 


where [13] the right side is the value of J(1:2) for two normal distributions 
with common variance Vars(y) and means E,(y) and E2(y). 


log m2 (Ex(y)) = 0, ‘ log m,(a) | ae 0, 
a—E2(y) 


(4.8) 
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We take y as the linear function y = ey, + Coto + --- + cey, , where the 
random variables y; , 2, --* , ys are such that the requirements already imposed 
on y are satisfied and Vars(y) = Dot mite; cove (y;, yj). Then, as is known [13], 
the l.u.b. of the right member of (4.10) for possible values of the c’s is given by 
the quadratic form 46’0%'6, where 6 is the one column matrix of the differences 
6; = Ey(y;) — Ex(y;) forj = 1, 2,--- , k, and d’ is the transpose of 6 while o« 
is the matrix of variances and covariance of the y; for 7 = 1, 2,--- , k in the 
distribution defined by (4.9). 

We thus have the second inequality 


(4.11) 1(1:2; x) = I'(1:2; y) = 468’o%'8. 


For the binomial distribution, this yields 


; (~pr — pr)’ pre! q2 
(4.12) ploget+qlog2ei*@—m » = ae 
sates P2 - q2 2P« Ix es poe’ + qo ie 


poe’ + qr’ 


for some value of t between 0 (when ps = pe) and log pig2/qip2 (when px = pr). 
Note that px = b, and that from our derivation b is between p; and pr . 


5. The Cramér-Rao inequality. For the parametric case, where the populations 
are neighboring points @ and 6 + Aé@ in the k-dimensional parameter space and 
the y; for 7 = 1, 2,---, & are unbiased estimators of the parameters, (4.11) 
yields, under suitable regularity conditions [14], 


(5.1) (A0)'G(A0) = (A0)'H(A0) = (A0)'o (8), 


where Aé is the one column matrix of the Aé; for 7 = 1, 2,--- , k and (Aé@)’ is 
its transpose, while G and H are respectively the matrices (g.g) and (has), for 
a, 8B = 1,2,---,k, where 


0 . a 
) ( = — ( 
Jas [s@ ( log f(a)) (< log f(a)) d(x) , 


0 0 : 
hes = [ow (2 log vv) (< log av) ) dy(y), 


and o is the matrix of variances and covariances of the estimators. 

It should be observed that the discussion in Sections 4 and 5 holds whether we 
are dealing with a fixed sample size or sequential procedure. For the latter case, 
({16] p. 216) let 9c of the probability spaces (9, 8, u;), be the space of all possible 
infinite sequences (x) of observations 2, , 22, --- . Let there be given an infinite 
sequence of Borel measurable functions $1(21), d2(a , %2), +++ ,@i(%1, M2, °° * , Zs) 

- , defined for all observable sequences in & such that each takes only the 
values zero and one. We further assume that at least one of the functions ¢;(2), 
¢o(2 , 2%), +++ takes the value one [A(x)], and let n be the smallest integer for 
which this occurs. Thus n(x) is a chance variable. 

The sequential process is then defined as follows. Take an observation and 
find @;(x,). If it is unity, the sampling process stops; otherwise sampling con- 
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tinues. If a second observation is taken and the value of ¢2(x; , x2) is unity, the 


process stops; otherwise it continues, and so on. In general, after taking 7 ob- 
servations, 


%i(%1,%2,°-:, 2) = Ofori = 1,2,----,j — 1. If 
(4%), 2, °** ,2;) = 1, sampling stops; otherwise it is continued. 


If R; denotes the set of all points (x; , 22 , ---) for which the process stops with 
the jth observation, then x = U,R; . The variable y is taken as a function of the 
observations 2;, 22, °°: , 2, (those obtained prior to the termination of the 
process of drawing observations). 

Thus the results in (4.11) and (5.1) hold for fixed sample size or sequential 
procedures. 


6. Quadratic forms. Certain useful results with respect to quadratic forms, 
which are essentially corollaries of known theorems, are needed for the subsequent 
discussion. 

Lemma 6.1. If both X'’AX and X'CX are positive definite quadratic forms 
(matric notation) such that X'AX = X'CX, then 

(a) the roots of |A — C\| = Oare real and 21; 

(b) |A| 2 |C\; 

(c) any principal minor of A is not less than the corresponding principal minor 
of C, (determinant or quadratic form) ; 

(d) Y’C"Y 2 Y’A"Y; 

(e) any principal minor of C~ is not less than the corresponding principal minor 
of A~* (determinant or quadratic form). 

Proor. Results (a), (b), and (c) are immediate corollaries of theorems 44 and 
48 in Ferrar [10]. Since A~’ = C™'CA™ and C' = C'AA™, there exists a non- 
singular matrix B such that (Bécher [4], p. 301) C"' = B’AB and A™ = B’CB. 
Thus applying the transformation X = BY gives 

X'AX = Y’B'ABY = Y'C’'Y, X'CX = Y’B'CBY = Y'A™'Y, 
and (d) and (e) then follow. 


7. Efficiency. With respect to the estimators y; of Section 5, the discrimination 
efficiency at a point P in the k-dimensional parameter space (P.8.) is defined by 


_ (db)’ H(d8) 


(7.1) A = (d0)’ G(dé) ° 


We take (d6)’G(d@) as the basis of the metric of (P.S.). The gag for a,8 = 1, 2, 
++», k, are the components of a covariant tensor of the second order which is 
called the fundamental tensor of the metric (Eisenhart [9]). Since (dé)’H(dé) s 
(d@)’G(d@) and both forms are positive definite, the roots of 


(7.2) \lH — rAG| = 0, 
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are real, positive, and all $1. Accordingly there exists a real transformation of 
the 6’s such that at a point P in (P.S.) the forms in (7.1) may be written as 


3 2 
(7.3) wo Ai Wr -°2 Sm a 
dy; + _— - dy. 


and d\;, 2, --* , Ax, are the roots of (7.2) (Eisenhart [9] p. 108). If we write 


dy; 
dpi + --> + dy’ 


(7.4) cos’ a; = 


then (7.3) may be written as 
(7.5) dh = Ai COS a + Az COS a2 +--+ +, COS HY. 


The directions at the point P determined by cos a, = 1, cos a, = 1, «+: , are 
known as the principal directions determined by the tensor hag (Eisenhart [9], 
p. 110). Furthermore, at the point P the finite maxima and minima of ) defined 
by (7.1) are given by the principal directions at the point and are indeed the 
roots of (7.2). Since (dé)'G(d@) is positive definite, \ is finite for all directions 
(Eisenhart [9], par. 33). 

As the estimation efficiency of the estimators y,, ye, °-* , Ye, we take the 
product of the discrimination efficiencies for the principal directions at the point 
P, that is, 


(7.6) Eff = Ave ooh = || / iG) s s, 


which is invariant for all nonsingular transformations of the parameters, with 
equality holding if and only if the estimators are sufficient [14]. 


8. Asymptotic efficiency. Suppose we have n independent observations from 
an l-variate population with k parameters. It is also of interest to consider, 
instead of (7.1), the asymptotic discrimination efficiency at a point P in (P.S.) 
defined by 

(d8)’o" '(d8) 
(8.1 A= ——— n large, 
n(dd)’ G(d8) 
where the elements of the matrix G are computed for a single observation from 
the l-variate population. Since (d@)’o"'(d@) < n(dé@)'G(d@) and both forms are 
positive definite, the roots of 


(8.2) o ' — rnG| = 0 


are real, positive and $1. As in Section 7, the roots of (8.2) are the finite maxima 
and minima of (8.1) at a point P in (P.S.) and are given by the principal direc- 
tions determined by the tensor o at the point. 

As the asymptotic estimation efficiency of the unbiased estimators y; , yz, +--+ , Ye 
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(cf. Cramér [7], pp. 489, 494) we take the product of the asymptotic discrimina- 
tion efficiencies for the principal directions at the point P, that is, 


Asymp Eff = \iA2 «++ Xe = |o | / |nG| S 1, n large, 


the equality holding for all n if the estimators are sufficient and (4.5) is satisfied. 
If |o| |G| + n™, then the asymptotic efficiency approaches unity and \,; — 1 for 
t= 1,2,---,k. 


[2] E. 


(3 
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SOME FURTHER RESULTS IN SIMULTANEOUS CONFIDENCE 
INTERVAL ESTIMATION 


By 8. N. Roy 
University of North Carolina 


1. Summary. This paper is a follow-up of a previous paper [1], the full implica- 
tions of some of the results there being brought out here in terms that are physi- 
cally more meaningful. Two cases of simultaneous confidence bounds, I and II, 
are given, in each case with a confidence coefficient which is to be greater than 
or equal to a preassigned level. 

Case I relates to the characteristic roots of = and 2,22", where = stands for 
the dispersion matrix of one p-variate and 2, and >, for the dispersion matrices 
of two p-variate normal populations. Case II relates to a (p + q)-variate normal 
population (p S q), for which the matrix of regression of the p-set on the q-set 
is defined in a natural manner. This matrix is denoted by 8(p x q) and simultane- 
ous confidence bounds are given on all bilinear compounds of this matrix (with 
arbitrary coefficient vectors of unit modulus). 

Confidence bounds on the characteristic roots of = and 222° are given 
respectively by (3.1.3) and (3.2.8). Confidence bounds on the bilinear compounds 
of the regression matrix 8 are given by (4.7). 


2. Introduction. Let us denote by A’ the transpose of a matrix A, and shorten 
positive definite into p.d. and positive semidefinite into p.s.d. Also let ¢min{M) 
and Cmax(M) denote the smallest and the largest characteristic root of a p.d. 
matrix M. A p x p diagonal matrix whose diagonal elements are, say, ¢;, 


C2, °** , Cp Will be denoted by D.(p x p) or simply by D,. Ap x p unit matrix 
will be denoted by /(p). 


2.1. Statement and reduction of the problem for the case of = and 2,22". We 
take over from the previous paper [1] the two confidence statements (5.1.5) and 
(5.2.4) and renumber them as 


(2.1.1) a’ab.(p,n) S a’ (DyVont’STDyVJs)a S a’a62a(p, ”), 


(2.1.2) (me ‘n)O1a(p, m1, n2)b’Sxb Ss b (uDy Jou Sin” Dyr/gu’ yb 


S (n2/11)O2a(p, M1, N2)b’ Sed. 


These statements are supposed to hold respectively for all nonnull a(p x 1) and 
b(p x 1), and each with a confidence coefficient 1 — a. 

In (2.1.1), S stands for the sample dispersion matrix, n + 1 for the sample 
size, and the 6’s for the characteristic roots of =. Here T is an orthogonal matrix 


Received 1/8/54, revised 6/29/54. 
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given by > = [DelI”, and 6,.(p, n) and 24(p, n) are subject only to the restric- 
tion 


(2.1.3) P (6014 s A, s Op s O24 | 2) Se a, 


where @, and 6, are the smallest and the largest characteristic roots of nS. Other- 
wise 6,, and 6. are, for the moment, left flexible, unlike what was done in the 
previous paper [1]. 

In (2.1.2), S, and S, stand for the two sample dispersion matrices, n, + 1 
and n, + 1 for the two sample sizes, and the 0’s for the characteristic roots of 
2,22". Here » is a nonsingular matrix given by 2; = wDeu’ and 2. = yy’, while 
ia(p, Mi, Ne) and b.4(p, mi , Nz) are subject only to the restriction 


(2.1.4) P (614 s A; s 4» s Bea | Dy = 22) == 1 o> 


where 6, and 6, are the smallest and the largest characteristic roots of (n:/n2)S,S2'. 
Otherwise 6:2 and 6., are, for the moment, left free, unlike the development of 
the previous paper [1]. 

Let us denote by c(M) any characteristic root of the matrix M. Then it is well 
known that the statements (2.1.1) and (2.1.2) are respectively equivalent to 


(2.1.5) (1/n)Oa(p, n) S all c(Dyvel’STDyVs) S (1/n)b20(p, n), 
(2.1 .6) (no/m1)O1a(p, mh, Ne) s all c(uDy/ou Su’ Dion’ S2') 


> (n2/11)O2a(p, mn, N2). 


We notice that 0; = c,(Z) in (2.1.5) and = ¢,(2,23') in (2.1.6), with ¢ = 1, --- , 
p. It is now our purpose to obtain confidence bounds on @,’s (or their functions) 
in terms of c;(S)’s (or their functions) in the case of (2.1.5), and in terms of 
c;(S;) and ¢,(S2) (or their functions) in the case of (2.1.6). For c;(Z)’s the con- 
fidence bounds are given by (3.1.3) and (3.1.4), and for ¢,(2,22') by (3.2.8). 

2.2. Statement and reduction of the problem for the case of the regression matriz 8. 
We recall the confidence statement ({1], (6.1.4)), with a confidence coefficient 
1 — a: 

t.(n — 2) ta(n — 2) 


(221) b- == Vi-f#tspsb4+2 = vi-r4 


+i 
/n — 2 82 Vn —2 82 


where 8 (which is now a scalar) stands for the population regression of x; on 2» 
(where x, and 2, have a bivariate normal distribution), b for the sample regression 
(in a random sample of size n 2 3), r for the sample correlation, s; and s, for the 
two sample standard deviations, and ¢, for the upper }a-point of the ¢-distribu- 
tion with D.F. (n — 2). 

We also note that 


(3. 3:9) b = Ts, so = T8 82 /$0 , B = po\d2 ‘a2, 


where p, « , and o, stand respectively for the population correlation coefficient 
and the two standard deviations. 
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We now start ({1], Sec. 6.2) with a random sample of size n, with n > p + q 
and p S q, from a (p + q)-variate normal population. Next we reduce for the 


means and set 
Su Sry Y; a. ‘ 
(n — 1) 7 = : (Y; Yz2), 
Sie Se y 2 


where S,,, Sx, and Sy stand respectively for the sample dispersion submatrix 
of the p-set, that of the q-set, and that between the p-set and the q-set. Here 
Y, and Y; have p.d.f. proportional to 


(2.2.3) | be omy ( ‘) (Y! ro | 
2. exp | — = tr : ; : 
- 2 ae hte Y, ; 


We next recall ({1], Sec. 6.2) that there exist nonsingular y;(p x p) and ye(q x qg) 
such that 


(2.2.4) Zn(px p) = mlpx p)uilpx p), Taq x q) = ua(q x Q)u2(q x 9); 
Zu(p xq) = m(px p)[Dvye Oua(g x 9), 
where D,/o is a p x p matrix and the 0’s are the characteristic roots (all non- 


negative) of the matrix 2};Z.2r 2» (i.e., the squares of the population canonical 
correlations between the p-set and the g-set). As in ({1], Sec. 6.2), we have 


* (* ‘) I P) Jha (Dve 0) e a. 
mn D . ’ 
2 O pe ( xe) x I(q) 0 we 
(eM ‘(Dy °) ee. = @ 4 ‘yf 
0 pe 0 * I(q) ri " 1(q) 0 pe 


0 


PVT ; 8 . Dyas + ~(Dyera= 0) 
os bere) . . . . . e's e . ° 
( FO) * (9) 0 1(q) 


ns 6€0«O 
x “il ° 
QO pe 
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Going back to (2.2.3) and using the result that tr (AB) = tr (BA), we have 


> - Yon ? 

2u 212 Y; i 
u( , ( ‘) (Y1 ¥2) 

Li2 222 } 2 


; — > “(D ari rye 0 i. (v’ ¥’) 
= r . . . . . . . 
(2.2.6) 0 . 1(q) 0 wu /\¥2 ip 


Dyas * © 


Dyers): T tr 
( ro)" 19) 


997 , _ =ly =ley wt 
(2.2.7) Zi = VYian-eur Yi — (Dyerame (0) Y2, Z2 = 2 


Thus it is easy to check from (2.2.3), (2.2.6), and (2.2.7) that (Z,, Z2) have a 
p.d.f. proportional to 


f Z | 
(2.2.8) exp ) —} tr 4 (Z; Z>) : 
\ 42 


Consider now, for any two arbitrary nonnull vectors g:(p x 1) and @:(q x 1) and 
for a fixed positive 6) , the statement 


(2.2.9) (G1 Zi Zon) __ 
(a; Z:Z\ 41) (a2 Z2Z242) 7 


This can be written in terms of Y; and Y>. as 


(2.2.10) lai{ Dye —6) Hi ax Yous a Over —0) 0)u2" Y2Your Nal’ < 
(aor Y2 Yeue * a») (a; QQ’ ay) 


where 
(2.2.11) Q = Dymace ui Y1 — (Dyers (0) us 2 
Now putting 
(2.2.12) bi x p) = @Dyiramewi’, = ba x g) = Ges” 
and using (2.2.4), we check that (2.2.10) reduces to 
[bi(¥i¥2 — BY2Ya)bal 

[bo Ya Yobsllbs(¥i — BY2)(¥1 — Yo 8’)bl 

or 


(2.2.13) i ee 8 8s 
(b2 S22 ba)lbi(Su — Six8’ — BS + BSaB bl = 
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where 


(2.2.14) Bip xq) = u(Dyo O)u2' = Sern. 
As defined by (2.2.14), 8 can be appropriately called the matrix of population 
regression of the p-set on the q-set. It is the only set of population parameters 
that occurs in statement (2.2.13). 

For an p x p matrix B, let tr,(B), for s = 1, --+ , p, stand for the sum of all 
sth order principal minors of B. It is well known that 


Pp 


(2.3.1) tr,(B) = ye c;,(B)ec;,(B) « - -¢;,(B), 


ty pile pt -+++ptigel 


and, in particular, that tr;(B) = >-¢.(B) = dis , and tr,(B) = [[eB) = B, 
Also well known is that 


2.3.2) c[A(p x p)B(p x p)| = c[B(p x p)A(p x p)] 


Furthermore we recall 

Lemma A. The product of two p.d. matrices is p.d. If A(p x q) [rank r S min 
(p, q)| is a matrix with real elements, then AA’ is p.s.d. of rank r. 

We have also [2] that 


(2.3.3) Cmin(A) Cmin(B) S all c(AB) S Cmax(A) Cmax(B), 


where A and B are two symmetric matrices of which one is p.d. and the other is 
at least p.s.d. The generalization to the product of a finite number of matrices 
is obvious [2]. We also take over from [2] the result that 

(2.3.4) Cmin(MM’) S c(M) S Cmax(MM’), 


where M is a square matrix with real characteristic roots. From (2.3.4) it is 
easy to see, by replacing A by AB™ (if B is nonsingular), that 


(2.3.5) Cmin(AB™) Cmin(B) S all c(A) S Cmax(AB™) Cmax(B). 


Next, we establish 
Lemma B. If di S all c(AB™") S dz, then 


(d;)‘ tr,(B) s tr(A) S (d.)‘ tr.(B), t=1, ---,p, 


where A and B aretwo p x p matrices and d, and dz any two positive numbers such 
that d, S d,. 

The conclusion is a necessary (though not a sufficient) condition for the hy- 
pothesis. 

Proor. It is easy to check that the statement d; < all c(AB™’) is equivalent 
to the statement ‘‘A-d,B is p.d.” which again is equivalent to the statement 
“A, — d,B,, fort = 1, 2,--- , p, isp.d.”, where A; — d,B; is asubmatrix formed 
by the intersection of any t rows of A — d,B with ¢ columns bearing the same num- 
bers. The last statement again is equivalent to the statement d; < all c(A,B7’). 
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Now, if all c(A,Bz') > d; , one consequence is that 


: 4.| 
(2.3.6) [] ¢(A,Bz') > (d,)', that is, it > (d,)‘, thatis, |A,| > (d)‘ |Bi. 
t=] t 


For a given ¢t, summing over different possible submatrices we have 
(2.3.7) tr.A > (d,)' tr,B. 


Using the same kind of argument for the other half of the inequality and remem- 
bering that t = 1, 2, --- , p, and combining, we have the result that 


(2.3.8) if d; < allc(AB™) <d,, then (d,)‘ tr,(B) < tr,(A) < (d.)‘ tr,(B), 
t=1,---,p. 


By a slight rephrasing (which is obviously permissible here) we can obtain Lemma 
B) from (2:3.8). We recall the three following well known lemmas, repeatedly 
used in [2]. 

Lemma C. The statement “g, S all c(M) S ge (fora px preal matrix M with 
real roots)’’ is equivalent to the statement “‘g, S d'(1 x p)M(px p)d(px 1) S 
(for all arbitrary vectors d of unit modulus)’’. 

Lemma D. The statement “g, S all c(M,Mz") S g (for two px p real matrices 
M, and M, with real roots, M, being nonsingular’’ is equivalent to the statement 
“9, S d'(1 x p)Mi(p x p)d(p x 1) / d’(1 x p)Mi(p x p)d(p x 1) S ge (for all 
arbitrary nonnull vectors d)’’. 

Lemma E. The statement “z'(1 x q)z(q x 1) S h (> 0)” is equivalent to the 
statement “|zx'(1 x q)d(q x 1)| S Vh (for all arbitrary vectors d of unit modulus)”. 

2.4. A result in set-theoretic logic. It is well known that the statement “If £, , 
then E,”’ is equivalent to the statement “FE, is a necessary condition for FE,” 
which again is equivalent to the statement “EZ, C E,’’. All these statements imply 
that “P(£,;) S P(E:)’”, which is a necessary (though not a sufficient) condition 
for the other statements. This will be used in the derivation of the confidence 
bounds. 


3. Confidence bounds on c(=)’s and c(Z,22')’s. 
3.1. Bounds on c(Z)’s. Starting from (2.1.5) and noting that 


(3.1.1) e(Dy yal’ STDy va) = c(STDiyel’) = c( SE"), 
we have, with a confidence coefficient 1 — a, the equivalent confidence bounds 
(3.1.2) (1/n)®a(p, n) S alle(SZ™) S (1/n)Oc0(p, n), 

nOia(p,n) 2 alle(ZS™) = n6za(p, n). 


From (2.3.6) we observe that this implies 


(3.1.3) nOia(p, N)Cmax(S) 2 all c(Z) = nOza(p, n)emin(S), 
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which is thus a set of simultaneous confidence bounds with a confidence coeffi- 
cient 2 1 — a. We note that, by using Lemma C, we can replace “all c(Z)” 
occurring in the middle of (3.1.3) by “a’Za (for all arbitrary vectors a of unit 
modulus).”’ 

From Lemma B we also observe that (3.1.2) implies 


(3.1.4) [nBia(p, ny‘ tre(S) = tre(Z) = [ndz2(p, n)}‘ tri(S), 


fort = 1, 2, --- , p, which is thus also another set of simultaneous confidence 
bounds with a confidence coefficient = 1 — a. Using (2.3.1), tr,(S) and tr,(Z) 
are easily calculated in terms of 6,’s and 6,’s. 


3.2. Bounds on c(3,22")’s. Starting from (2.1.6) we have, with a confidence 
coefficient 1 — a, the confidence bounds 


(3.2.1) (ny /n2)Oi4(p, N,,M%) = all c(S2(u’)"D you’ St uD yeu’) 

= (n/n2)020(p, m4, N2). 
Using (2.3.2) and (2.3.6) we have 
(3.2.2) Cmax Se(u’) Dou’ Si wD ou lemax(S2) 

> all c[(u’) “Dy you' Si uD you] = all c(St'A) 

Cmin[ S2(u’)"D ou’ Si'uD you) Cmin( Sz"), 
where 
(3.2.3) A = (uDyeu')((u’) "Dyeu’) = (uD you") (uDvyou')’. 
In the same way we have 
(3.2.4) Cmax(S1'A) Cmax(Si)2 all c(A) 2 Cmin(St' A) Cmin(S1). 
Furthermore, noting that 
(3.2.5) c(uD yeu) = c(Dys) = VO = clu" Dye’), 
and using (2.3.5), we have 
(3.2.6) Cmax(A) 2 all c'(uD yeu’) all c'(Dye) = all @; = Cmin(A). 
Combining (3.2.2), (3.2.4) and (3.2.6), we have 
(3.2.7) Cmax(Se(u’) "Dyou' St' uD ou") Cmax(S2') Cnax(S1) 
2 all OQ; > Cmin(S2(u’)” Dou’ St wDyeu) Cmin(S2") Cmin(S:). 

From this it is easy to check that (3.2.1) implies 


G.2.8)  (m/ )Oia(P, M1 , M2) Cmax(S2') Cmax(Si) 2 all c(2,22') 
2 (n1/n2)O20(p, Mm , Na) Cmin(S2') Cmin(S1), 
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which is thus a set of simultaneous confidence bounds with a confidence coeffi- 
cient 2 1 — a. We observe that, by using Lemma D we can replace “all 
c(2,22')” occurring in the middle of (3.2.8) by “‘a’Z,a/a’>.a (for all arbitrary 
nonnull vectors a(p x 1)).” We notice that 


Cmax(S2') = 1/¢min(Ss),  Cmin(S2') = 1/Cmax(Ss). 

Confidence bounds in terms of tr, could also be given as in (3.1.4), but in this 
case the bounds would be more complicated and would appear to be less worth- 
while than in the previous case. 

3.3. Determination of the constants (@,4(p, n), Oa(p, n)) and (64(p, Mi, Na), 
Oea(p, Ni , N2)) occurring in the confidence bounds. It has been stated in Section 2 
that the pair 6,.(p, n), G2«(p, n) for the first problem and the pair 6,.(p, m: , 2), 
62a(p, M1, M2) for the second problem satisfy respectively the conditions (2.1.3) 
and (2.1.4), but are otherwise free. It is well known how the shortness (in the 
sense of probability) of a confidence interval (or intervals) ties in with the power 
of the associated test. Let us consider the associated tests, or rather, the ac- 
ceptance regions of the respective hypotheses (i) H( = X») and (ii) H(X, = 
z.). They are, respectively, 


(3.3.1) H(Z = Xo): O1a(p, n) 6, S 0, S O.a(p, n), 
(3.3.2) H(2; = 22): Oal(p, m, Ne) SA S Op S O2alp, m1, Me). 


In the first case it is possible to choose 6,. and 6:. (and this choice will be 
unique) so as to let the second kind of error (which, aside from p, n and a, de- 
pends only on the characteristic roots of Zo’) have a (local) minimum, that 
is to let the power have a local maximum at 2 = 2X», when 2 # Xy is supposed 
to be the alternative. In this case it so happens that the resulting power function 
then monotonically increases as each c,(2Zo') tends away from unity, provided 
that all are = 1 or S 1, to begin with. 

In the second case, we have an exactly similar situation, H(2 = Yo) being 
replaced by H(Z, = 2») and Eo" being replaced by 2,22". The effect of this on 
the shortness, in the probability sense, of the resulting confidence bounds is 
obvious and need not be discussed in detail. 

The results just stated are proved in another paper to be published shortly. 
It may be noticed, however, that for any pair (@:4 , e242) subject only to (2.1.3) or 

2.1.4), we are going to get anyway the confidence bourds of Sections 3.' and 
3.2, with confidence coefficients 2 1 — a, the only difference being that they 
will not have the property of ‘‘shortness’”’ possessed by those that are based on 
(O14, 92) determined in the above way. 


4. Confidence bounds on the regression matrix ,.23; or 8. It is well known 
[1] that the statement (2.2.13), for all arbitrary nonnull b; and b,, is exactly 
equivalent to 


(4.1) all 0; S % or 6, S %, 
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where the 6,’s, fori = 1,---,pandO0 <6, S--- 6, = 1, are the roots of 
the determinantal equation in 6 


(4.2) |0(Sy — Sp’ — BSi2 + BSf’) — (Sx — BS2) So (Sis — Sw6’)| = 0 


Now put \ = 6/(1 — @), so that we have, from (4.2), the determinantal equa- 
tion in 


(4.3) MSy a BSusffes See) ine (BuaMoe _— B)So2(S22 Sie — p’)| = 0. 


Statement (4.1) can now be replaced by the statement that the largest root of 
(4.3) is not greater than \ = @)/(1 — 6), that is, 


(4.4) all c[(Su — Sy2S22 Siz) '(B — B)Sx(B’ — B’)| S &/(1 — 6), 


where B(p x q) = SS. This B may be called appropriately the matrix of 
sample regression of the p-set on the q-set. 

We note that (4.4) is equivalent to (4.1) which again is equivalent to (2.2.9), 
so that 6, is the largest characteristic root of the matrix (Z1Z1) '(Z:Z2)(ZoZ2) 
x (Z.Z;), where (Z; , Zz) have the p.d.f. given by (2.2.8). The joint distribution 
of these central 6,’s, and also of the largest root 6, are known; thus all that we 
have to do to make (4.4), that is (4.1), that is, (2.2.9), a simultaneous confidence 
statement with a joint confidence coefficient 1 — a isto choose % = 0.(p,q,n — 1) 
= @, (say), where 4 or @, is defined by P(central 6, 2 4) = a. 

Now, as in Sections 3.1 and 3.2, using (2.3.5) and the result in Section 2.4, we 
have from (4.4), with a joint confidence coefficient 2 1 — a, the simultaneous 
confidence statement that 


(4.5) all c{(B — 6)(B’ — B’)| S [02/(1 — Oa))¢max(Sir — Si2S22' Si2)¢max( Sor). 
We now note that 
Cmax(S22) = 1/Cmin(Se2), 
Cmax(Sir — Si2Sez Sie) S Cmax(Su)Cmax(l — Sti SSzz Si), 
Cmax(1 — Sit SwSz2Si2) = 1 — Cmin( Sir Si2S22 Siz). 


Using these, we check that (4.5) can be replaced (with a confidence coefficient = 
1 — a) by 


(4.6) all e|(B _ 8)(B’ — B')| > ve - (1 - Cin (Si Sta SpatS¢g) | mez Sw) . 
l a Ga Cmin (S22) 


Letting h denote the right side of (4.6), and applying the Lemmas C and E to 
(4.6) we have, with a joint confidence coefficient 2 1 — a, the following equiv- 
alent simultaneous confidence statements for all arbitrary unit modulus vectors 
di(p x 1) and d.(q x 1), 


(4.7) |d(B—s)dl S Vh, diBd — Vh S dibd: S diBa, + Vh. 
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A set of simultaneous confidence bounds on just the elements §,; of the £- 
matrix would be a subset of the bounds on the total set d\Bd. . It is worthwhile 
to check that, if p = q = 1, (4.7) reduces, as it should, to (2.2.1). Also, if p = 1, 
we should have another special case of (4.7) giving a set of simultaneous con- 
fidence bounds on all linear functions of the partial regressions of one variate on 
several others. Thus, in several ways, (4.7) seems to be an appropriate generali- 
zation of (2.2.1). 


REFERENCES 
{1} S. N. Roy anp R. C. Ross, “Simultaneous confidence interval estimation,’’ Ann. Math. 
Stat., Vol. 24 (1953), pp. 513-536. 


[2] S. N. Roy, ‘“‘A useful theorem in matrix theory,’’ Proc. Amer. Math. Soc., Vol. 5 (1954), 
pp. 635-638. 





TABLES FOR THE DISTRIBUTION OF THE NUMBER OF 
EXCEEDANCES' 


By BENJAMIN EPsTeEIN 
Wayne University 


1. Introduction and Summary. Consider a random sample of size n taken from 
a continuous distribution f(z). Let another random sample, independent of the 
first sample and also of size n, be drawn from the same population. Let U? be 
the random variable associated with the number of values in the second sample 
which exceed the rth smallest value in the first sample. Similarly let V? be the 
random variable associated with the number of values in the second sample 
which exceed the sth largest value in the first sample. Due to the fact that the 
rth smallest value in a sample of size n is at the same time the sth largest value 
in the sample with s = n — r + 1, it follows that 
(1) Pr (U; = x) = Pr(V? = 2) 


s=n—r+l; r 
The probability distribution of U? (and hence of V‘) is given by: 
(2) Pr(UP =z) = (RT (PSR) = BP n—s4e—tr—t Pa—r+s.2/Pann 
a= 0.i,2,°°:,%. 


Formula (2) can be proved by combinatorial methods; details are omitted. An 
alternative formula, derived in another way [3], is 


(2a) Pr (U?; = x) = £(P7')(P), ae = OP «.tstt asl ten -l,n—r+z 


In formulae (2) and (2a), Pa.2 = (4)"(2). Formulae in terms of P,, are particu- 
larly convenient for hand computation, since one can use the extensive tables of 
the binomial probability distribution published by the National Bureau of 
Standards. 

If the values of Pr (U? S z),forz = 0,1,2,--- ,n —1l,r = 1,2,+-+,n 
are written (for fixed n) in matrix form, one notes certain useful symmetries, which 
can be expressed by the identities 
(3) Pr(U; Sz) =Pr(Uin S 


r-—1), 
(4) Pr (U? sz) + Pr(UjRw Sn—2z-1) =1, 


If one takes x = n — rin (4) and uses the relation (3), it is readily verified that 


(5) Pr (U? sn —r) = . 


Received 7/20/53, revised 9/18/53. 
' Work sponsored by the Office of Naval Research. 
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TABLE 2 
Values of Pr(U$ < z) 


z=0 i 2 


.0°397 .0238 .0833 . 2222 5000 
.0238 . 1032 . 2619 5000 7778 
. 0833 - 2619 5000 .7381 .9167 
. 2222 5000 .7381 . 8968 9762 
5000 7778 .9167 .9762 97603 


Proofs’ of (3), (4), and (5) can be obtained by using the results of pages 257-258 
of [3]. Because of these symmetries, the complete matrix (for any fixed r) can 
be constructed if one knows only the quantities, Pr (U? s x), r = 1(1)[n/2], 
z=r-—l1,r,r+1,°:+,n—r — 1. In Table 1 these values are given’ for 
n = 2(1)15(5)20. To see how the complete matrix is obtained from Table 1, it 
is interesting to verify, using (3), (4), and (5), that the complete matrix, in the 
special case n = 5, is given by Table 2. 

A somewhat different, but related, exceedance problem is to take two random 
samples of size n from a continuous distribution f(z). Let us for convenience 
attach the letter x to one of the samples and the letter y to the other sample. 
Further let z,,, and y,,, be respectively the rth smallest observations in each of 
the samples. Let us define z,,, = max (2,.n , Yr.n)- If 2, = Zr,, , count the number 
of y’s which are 2 2;,» ; if 2, = Yr.,, count the number of z’s which are = 4,,, . 
Denoting the number of exceedances as W?, it is readily seen from (1) that the 
probability distribution of WP? is given by 
(6) Pr(Wr = 2) = 2(7tT"y(""s**) / G"), z=0,1,2 


; => 


It is evident from the definition that, 


(7) Pr (W; s 2) = 1, ran-—r. 


ample, in the special case n = 5 one obtains Table 3. 


Clearly one can find the values of Pr (W? s x) by using Table 1. Thus, for ex- 


2. Applications of exceedance theory. There are three principal uses of ex- 
ceedance theory. These are: 

(a) Floods and droughts. This theory was used by H. A. Thomas, Jr. [6] in 
making predictions about the recurrences of floods and droughts in the future 
on the basis of what is known from past data. In recent papers by Chow [1], [2], 
the interested reader will find further work in this direction. 


2 We wish to acknowledge with thanks « communication from Dr. E. J. Gumbel on this 
point 

*In Wayne University Technical Report No. 6 (July 1953) values were given for n = 
2(1)20(5)50. We have also considered the practically important case where the two samples 
may be of unequal size. Tables for selected pairs of unequal values of the sample size will 
be available in the near future. 
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TABLE 3 
Pr(W® < x) 


| , 
2 
0?794 1667 4444 1.0000 
0476 .5238 1.0000 
1667 52: 1.0000 
4444 
1.0000 


(b) Non-parametric tests for slippage. The functions U? , V? , and W? can be 
used to give two-sample nonparametric tests for slippage of the mean. There are 
close connections between the results in this paper and recent tests for slippage 
by Mosteller and Tukey [4] and [5]. 

(c) Life testing. It is a characteristic feature of life tests that data become 
available in order of size. Thus it becomes very natural to apply exceedance 
theory, which is based purely on order statistics. By so doing it is possible in 
many cases to shorten both the average time and average number of items de- 
stroyed in order to reach a decision as to whether or not the items in one popula- 
tion are in some sense superior to the items in another population. 


3. Numerical examples. 


EXAMPLE 1. What is the probability that the third largest flood during the 
past 20 years will be exceeded at least once during the next 20 years? Answer. 
The probability is 


p =1—Pr(V; =0) = 1 — Pr (Ui, = 0) = 1 — .1154 = .8846. 


EXAMPLE 2. During a period of 20 years the lowest observed annual rainfall 
in a certain locality was 8.6 inches. What is the probability that in the next 20 
years at least two of the years will have rainfall S 8.6 inches? Answer. The 
probability is p = Pr (Uy Ss 18) = .2436. 

EXAMPLE 3. (one-sided test): We are now interested in making a choice be- 
tween two lots A and B. In particular we are interested in some characteristic 
such as life or strength, where data become available in order of magnitude. Let 
it be known a priori that the probability density function associated with lot B 
is either the same as that of lot A or is displaced to the left (e.g., is inferior). 
Put in another way, we are thinking of a case where the only relevant parameter 
is some measure of slippage. We wish to test the hypothesis Hy of no displace- 
ment against the alternative H, that B is displaced to the left of A. The Tyne I 
error is taken to be < .05. Ten items are drawn from each of the lots and placed 
on life test. It is decided in advance that a decision will be based on how many 
failures occur in the sample from B before the second failure occurs in the sample 
from A. The two samples are put on test simultaneously and give the pattern 
bbbabbb- --, where a denotes a failure in the sample drawn from A, b denotes a 
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failure in the sample drawn from B. The experiment is stopped at the seventh 
failure with rejection of the null hypothesis, because Pr (U2° < 4) = .0286 < .05. 
If, however, we had obtained a pattern like babba---, we would have stopped 
experimentation after the fifth failure with the acceptance of Ho . 

EXAMPLE 4. (two-sided test): Given two lots A and B, we wish to test the 
null hypothesis that the life distributions of A and B are the same against the 
alternative that they are different. As in Example 3, let 10 items be drawn at 
random from each of the two lots and placed on life test. It is decided in advance 
that our decision will be based on the statistic W2°. If, for example, the failure 
pattern observed is aaaaabaa: - -, the experiment will be terminated on the eighth 
trial with rejection of the null hypothesis (on the .05 level of significance). This 
is because Pr (W2° < 3) = .0198. On the other hand a pattern like babba- - - 
would lead to acceptance of the null hypothesis on the fifth trial. 

4. Discussion. Fairly extensive random sampling experiments have shown 
that the statistics Wi’, W2° , and W?}° are more effective than the run test, and 
somewhat less effective than the Wilcoxon rank test, for detecting slippage of 
the mean in the case where the underlying distributions are normal, all with the 
same variance. Since the improvement in power obtained by using W2° or W;° 
rather than W,° is minor in this case, there are sound practical reasons for pre- 
ferring W}° . Decisions based on this statistic can be made at a great saving in 
average time to decision, as well as average number of items destroyed. It should 
be noted in Example 4 that if decisions were based on W}° , we would have trun- 
cated testing on the fifth trial with the rejection of Ho, since Pr (Wy’ < 5) = 
0325. 

A detailed discussion of the points raised in the last paragraph will appear 
elsewhere. 


5. Acknowledgement. I wish to thank John Lay for his work in computing 
the tables. 
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ON THE FACTORIZATION OF DISTRIBUTIONS 


By Henry TEICHER 
Purdue University 


1. Summary. A family of probability distributions is called “factor-closed”’ 
(f.c.) if it is closed under the operation of factorization. The classical binomial 
family and certain generalizations of it are shown to be f.c. The multinomial 
family is also f.c. Most families of infinitely divisible distributions are not f.c. 


2. Introduction. If Fi(z) and F(x) are any two cumulative distribution 
functions (c.d.f.’s), the convolution (denoted by *) of F; with F, is again a c.df. 
say 


(1) F = F(x) = F, +F, = / F(x — y) dFx(y) = Pr{X < 2} 


where Pr denotes probability measure and X is called the random variable 
(r.v.) possessing the c.d.f. F. Further, if X, and X» are independent r.v.’s having 
e.d.f.’s Ff, and F, with corresponding Fourier transforms or characteristic 
functions (c.f.’s) @.,(t) and ¢,,(t), then F = F, + F, is the c.df. of X = X, + X, 
having, as is well known, the c.f. 


(2) ¢.(t) a $2, (t) 2, () = / e* dF (x). 


If one commences with ¢,(t) or F(x), any such representation as (2) or (1) is 
termed a factorization of ¢,(t) or F(x) and the components ¢,,(t) or F(x) are 
called factors. 

For an arbitrary distribution F, factorization is not unique. That is, F = 
F, * F, = F, « F; does not imply F, = F;. If F is infinitely divisible, this is no 
longer possible. Many results concerning factorization, as well as references, 
are given by Lévy [4], [5]. 

To avoid trivialities, we presume in what follows that all ¢.d.f.’s have at least 
two points of increase and consider two c.f.’s ¢,(t) and ¢.(t) as equivalent if for 
some real a, 

gilt) = exp {iat} do(t). 


The starting point of this investigation is the following 

Deriition. A family $ of ¢.d.f.’s will be said to be decomposable (8) if, for 
any element F of §, the relationship F = G, * G, implies that G, and G, are mem- 
bers of the family 8’. In particular, ii $ = 8’, the family § will be called factor- 
closed (f.c.).’ 


Received 12/14/53. 

1 The class of all ¢.d.f.’s as well as the family of prime or indecomposable c.d.f.’s (i. e., 
the only “factors” of ¢(t) are the trivial ones exp {iat} and ¢(t) exp{—iat}) are trivially 
f.c. 
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Thus, Cramér’s theorem [1], [2] on the factorization of the normal distribution 
states that the normal family is f.c. A corresponding result of Raikov [3] avers 
that the Poisson family is f.c. 

For later usage, P(z) is defined to be a quasi-polynomial if, for real d; and r; 
and integral m = 2, P(z) = }\™1r,2*’. If, in addition, m = 2, P(z) will be 
termed a binomial quasi-polynomial. 


3. The general binomial family. We define a sequence of independent r.v.’s 
{X;}, forj = 1,2,---,n = 2, by 


, 
Pr{X;=a;) =p;, Pr{X,;=b;} =¢q;=1-—p,;, 


where a; > b; are real and0 < p; < 1 forj = 1, 2,---, mn. Let 


= 


c; = a; — b; > 0, vi=>> Xj. 


j=l 


The e.f. of V’ is 


by (t) = [] (pye'’! + gq; e'*) = exp) it > b;> II (a; e*“' + gq). 


j=l \ j=l j j=l 


It suffices to consider the equivalent c.f. 


dy (bt) = I] (p; aoe } q;) = A II (eiti + 4;) 
j=l 7 


where A is a constant and 9; = qjp;' > 0. As dy(t) depends on the parameters 
a;,b;, p;, and n, it represents a family of c.f.’s and there exists the corresponding 
family of ¢.d.f.’s, say 3, whose explicit form is not required here. This family will be 
dubbed the general binomial family since it constitutes an obvious generalization 
of the classical binomial distributions connected with coin tossing, etc. It will 
be shown to be f.c. under certain conditions. 

As the c.d.f. of X; and hence of V is a step function with a finite number of 
jumps, the same must be true of any factor of the c.d.f. of V. We may therefore 
confine our attention (in looking for factors of @y(t)) to c.f.’s of the form ¢(t) = 
>" 7; exp{itd;}, where r; is positive, d; is real, and m is a positive integet 
>2. That is, we need only consider c.f.’s which are quasi-polynomials in z = e" 
with positive coefficients. 

Lemma. If a polynomial with nonnegative coefficients admits a factorization 
into quasi-polynomials with nonnegative coefficients, it admits a factorization into 
(ordinary) polynomials having the same coefficients. 

Proor. Let Po(z) = Li P(z), where Po(z) is an ordinary polynomial with 
nonnegative coefficients and P;(z) is a quasi-polynomial for 7 = 1, 2,---, r. 
Also, let m; be the smallest exponent of P,(z) for i = 0, i, --- , r. Since }°i m; 
is a nonnegative integer mp, we have immediately Po(z) = Ili P‘(z), where 
P’(z) is a quasi-polynomial with m; = 0 for i = 0, 1, --- , r. As any exponent 
appearing on the right side of the above equation must also appear on the left 
side, the P{(z) must be ordinary polynomials. Q.E.D. 

The distinguishing characteristic of the family 5 is that @y(¢) may be repre- 
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sented, by substituting z = e“, as a product of binomial quasi-polynomials. If 
y(t) may also be expressed as a product of quasi-polynomials which are not all 
reducible to binomial quasi-polynomials, it will be established that 3 is not, in 
general f.c. Consider the identity 
(z + 3)(z + 4)(2* + 8) = (2 + 2)(2* + 52’ + 22° + 42 + 48) 
(3) 
(z + 2)-P,(z). 


Now, although P,(z) is in general reducible to (2 + 3)(z + 4)(2° — 22 4+ 4), 
it is irreducible into ordinary polynomials having nonnegative coefficients. By 
the lemma it is also irreducible to quasi-polynomials having nonnegative co 
efficients. If each parenthetic factor in (3) is divided by the sum of its co- 
efficients and z is replaced by e", the expression on the left side is the c.f. of a 
member of 5, while that in the middle is a product of two c.f.’s, the second of 
which is not a member of 5. 

On the other hand if 35 is suitably restricted, it is f.c. Let 3,2, denote the sub- 
family of 5 with c; = ¢ or 2c for j = 1, 2, --- , n. We have then 

TueoreM |. The family 5, 2, is f.c. for any (positive) c. 

Proor: If c; = 1 or 2 forj = 1,2, +--+, n, then 


Wz) = dv(t) = AT] (2! + 4) 
j=1 


is the canonical decomposition of ¥(z) into linear and quadratic factors. As 
Gg, > 0, it is clear that no matter how y(z) is factored into ordinary polynomials, 
these must always be reducible to products of binomial factors with positive 
coefficients. With the lemma, this proves the theorem for the case ¢c = 1. For 
arbitrary (positive) c, the transformation y = 2 returns one to the case just 
examined. 

Coro.tuary 1. Let 3é, denote the subfamily of 3.2, wherein p; = p forj = 
1,2,-++. mn. Then 32, i8 f.c. 

Proor: By Theorem 1, 322, is decomposable (3, 2,). That 322, is also f.c. fol- 
lows directly from the fact that (for ¢ = 1) all the roots of ¥(z) must be equal 
to —q or +i(9) ~ 

From Corollary 1, it follows that the only factors of the classical binomial 
(Bernoulli) distributions are themselves binomial distributions. It suffices here 
to choose a; 0, and p; p. 

One might also define 3” as that subfamily of 3 for which p; = p for j c 

- ,n. However, it is simple to show via a counter-example that 3” is not f.c. 

In generalization of the preceding, we define for any integral k 2 2 the gen 
eral k-nomial family of distributions, say, U; , as fol.ows: Let |X ;| be a sequence 
of independent random variables with 


Pr{X; = asi} = pis, ys By °° By 


0 < Pi < l, LD Psi ve l, 
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There is no loss of generality in supposing aj; > aj. > --- > a, for all 7. Then 
Vy’ = ie. ¢ will be a c.v. having the “general k-nomial distribution.’’ We con- 
sider the case k = 3. 

THeoremM 2, /f a;; forms, for each j, an arithmetic progression whose common 
difference is independent of j, and if pj. < 4pj py for all j, then U; is f.c. 

Proor. Let bj; = a;; — aj, > Ofori = 1 or 2. Then if bj. = b, by hypothesis 
bj, = 2b. As earlier, it suffices to consider 


oy(t) = IT (pre + pre” + Pia). 
Sa 


But this is the canonical decomposition of a polynomial in W = e'’. In view of 
the positivity of the coefficients, and the lemma, U’; is necessarily f.c. The con- 
ditions pj, < 4pjpj; preclude trivial decompositions into binomial distributions. 

The factor-closedness of U; cannot be extended even to the case where 
the a;; are in arithmetic progression but the difference depends on j. It .suffices 
to note the counter-example 


(2° + 302° + 885967) (2° + 22 + 6)(z’ + 3z + 6) 
(2? + 5z + 1%)(2° 4+ 25g2? + 242 + 38)(2* + 1952" + z + 38). 


4. The multinomial distribution. The factorization problem, as well as (1) 
and (2), extend readily to the m-dimensional case, that is, to m random variables 
or to a single vector random variabie with m components. Where X, /’(x), and 
¢(t) were written previously, we need only” substitute (X,,---,Xw,), 
F(a, +++, 2m), and o(t,, +++, t,). Cramér has shown [2] that the family of 
multivariate normal distributions is f.c. 

We consider the classical multinomial distribution with n independent repe- 
titions of an experiment whose m mutually exclusive and exhaustive outcomes 
A,, +++ Am» have occurrence probabilities p, , +--+: p, . If X; is the r.v. denoting 
the number of occurrences of A; in the n trials, then 


Pr{(X; = 2), «°*, (Xn = 2e)} = (ni /TI x!) Pi'p2* *** Dm’, 
where )-j'2; = nand >-7 p; = 1. Here 
b(t ’ lo Pee 4 lin) — [pre" + pe"? T eo ous)". 
Let z; = e'’ and ¥(a1,--- , 2m) = (i, +++, tn). As before, the only possible 
factors of y are of the form 


as 2) 22 Lm an 
(4) ~ 9° Zs Geites***ste Zo" + Be = vilZ 9 fa5°°** p Bm) 
#1 im 
Again there is no loss of generality in supposing the x; to be nonnegative integers, 
that is, that y, is a polynomial rather than a quasi-polynomial. We now prove 
THeoreM 3: The family of (classical) multinomial distributions is f.c. 
Proor: Analogous to (2), we have, where y; is of the form (4) for 7 = 1 or 2, 


(pizi + pote + °++ + Pim)” = Pirye 
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Since the irreducible factor (piz; + +--+ + PmZm) is an n-fold factor of Yi-we, 
it must be an n,-fold factor of y and an n.-fold factor of yo , with ni + ne n, 
that is, 


vi = (> Pi «:) Q(z — 2m) s é 
1=1 


Clearly, Q; = constant = 1, since y(1, 1, ---, 1) = o(0,--- ,0) = 1. Finally, 
0 <n; < nif degenerate c.f.’s and c.d.f’s are precluded, as earlier. 

Slight generalizations of Theorem 3 are possible. The writer has proved that 
the family of (correlated) multivariate Poisson distributions is f.c., but this 
will not be given here. 


5. Infinitely divisible families. Returning to the unidimensional case, F(x) 
is called infinitely divisible (i.d.) if {p(t)|'" is a c.f. for every positive integer n. 
Khintchine’s form [6] of Lévy’s formula [4] gives as a necessary and sufficient 
condition for F(x) to be i.d. that 


. : a ilu l+u : 
f ' zs ) — o G 
(5) log o(t) yl + ‘. (« l i+ al a ) dG(u) 


where ¥ is real and G(u) is bounded, monotonic nondecreasing and can be nor- 
malized so that G(u") = G(u), G(—«) = 0,and G(+«) = B. Furthermore, 
the normalized representation is unique. 

If G(u) is a step function with only a single jump point, that is 


& ws <9, or (b) 


lo, ue O, 


( 
’ 0, <cx0 
(a) Glu) = < G(u) = ¢ ™ : . 


lao, ue, 


then (a) yields the normal family of distributions while those in (b) are closely 
related to the Poisson family. If ¢ = 1 and y = 40°, then (b) is the Poisson 
family. 


Suppose that G(x) has n discontinuities a, < a, < --- < a, with saltuses 
bi, be, «+: » by. Then G(u) = G,(u) = G,(u; a, +++ a, 5 by +--+ b,) has a cor- 
responding i.d. ¢.f. @n(t 31 -++ Gn; by +++ by) and cdf. F(x 5a, +++ ay; by +++ by). 

For any fixed n = 1, 2, --- , consider the family 


’ 
Fn = {F,(x; a + + ii > by eae * b,)}. 


If G,(u) is a step-function, denote the corresponding family of c.d.f.’s by F,. 


Turorem 4. For any fixed n = 2, 3, +--+ , the id. families 5, and F, are not 


f .c. 
Proor. Let b = 2a 7 b; . Define 


G,(u) — b G,(u) 
ies ; 
Gi(u) = « B—b G,,(u) = b 


0 » Ges = fk 
Further, let @,(t) be given by (5) with Gi(u) replacing G(u), and define $,(¢) 
analogously. Clearly, G,(w) = (B — b)Gi(u) + bG,(u), whence 


on(l; a, +++ ay; b) «++ bn) = bn (t)-o, (1b). 
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Since Gi(u) has only one saltus, the c.d.f. corresponding to ¢,(t), say Fy(x) 


~ 


belongs to either $, or $, . Thus, F, or F, is not f.c. for n = 2. In particular, 


if G, is a step function, /;,(2) is a normal or (almost) Poisson c.d.f. 


REFERENCES 
{1} H. Cramér, ‘Uber eine Kigenschaft der normalen verteilungsfunktion,’’ Math. Z., 

Vol. 41 (1936), pp. 405-414. 

H. Cramér, Random Variables and Probability Distributions, Cambridge Tracts in 
Mathematics, Cambridge, 1937. 

D. Raixov, “On the composition of Poisson laws,’’ C.R. Acad. Sci. URSS, Vol. 14 
(1937), pp. 8-11 

P. Livy, Theorie de l’ Addition des Variables Aléatoires, Paris, 1937. 

P. Livy, “‘L’arithmétique des lois des probabilités,” J. Math. Pures Appl., Vol. 17 
(1928), pp. 17-39. 

A. Kuintcxine, “Contribution A l’arithmétique des lois de distribution,’ Bull. Math. 
Univ. Moscou, Vol. 1 (1937), pp. 6-17. 





_ON THE CONVOLUTION OF DISTRIBUTIONS 
By Henry TErICHER 
F Purdue University 


1. Summary. A systematic approach to distributions having the reproductive 
property (see [1] p. 171) is attempted, and necessary and sufficient conditions 
are given. The case of distributions depending on k (> 1) parameters is con- 
sidered; it need not be a straightforward generalization of the one-parameter 
case. 


2. Additively closed families of distributions. Let D = D(A) be an 
Abelian semi-group under addition. In particular, denote by D(/), D(/+), 
D(r+), D(R+), and D(R+, 0) the semi-groups of integers, positive integers, 
positive rationals, positive reals, and nonnegative reals, respectively. Let D(r), 
D(R), D(T+, 0) and D(R+, 0) be defined analogously. The abbreviations c.d_f. 
and c.f. will be used for cumulative distribution function and characteristic 
function, respectively. 

Derinirion. A one-parameter family of c.d.f.’s F(x; \) with A e D, and D 
as above, will be said to be additively closed or to belong to the class C, if, for any 
two elements F(x; \;) and F(x; de), 


(1) F(a; dy) *F (x; d2) = F(a; a + Ag). 


Among the following results, Theorem 1 is known in ene form or another 
but is required here for a unified presentation. Theorems 2 and 4 are new. Gen- 


erally, the k-parameter case does not seem to have been considered previously. 

TuroreM 1. Jf (i)\ e DU+) or (17) X & D(r+), a necessary and sufficient con- 
dition that a family of ¢.d.£.’s F(x; ) be additively closed, that is, that F(x; ) eC, 
is that the corresponding family of c.f.’s is @(t; 4) = [f(t)|’, where f(t) is a ef. not 
depending on d. If (iit) X € D(R+), and o(t; d) is continuous in d, the same con- 
dition is again necessary and sufficient. In cases (ii) and (iii), f(t) is the c.f. of 
an infinitely divisible distribution. 


Proor. The proof of sufficiency is trivial for the ensuing theorems. The three 
alternatives for \ are considered in turn. 

(i), \ e D(T+). Let f(t) = o(t; 1). Translating and iterating (1), we have, for 
any positive p, 


p(t; p) = o(t; L)o(t; 1) --- ot; 1) = [PM]. 


(ii), \ ¢ D(r+). We have, from (1), f(t) = @(i; 1) = (p(t; 1/p)|’. That is, the 
pth root of f(/) is a c.f. for every positive integral p, whence f(t) is the ¢.f. of 
an infinitely divisible (i.d.) distribution and hence never zero. (By the pth 
root is meant that branch for which f'”(0) = 1, which is unambiguous since 
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f(t) # 0 for real t.) Again applying (1) we see that, for any positive integers 
p and q, 


g(t; q/p) = [o(t; 1/p)|* = [f()]"”. 


(iii), A e DU R+). Since $(t; A) is continuous in X, it follows from (ii), by taking 
a sequence of positive rational numbers approaching any real nonnegative 4X, 
that o(t;) = [f(t)}*. 

If \ e¢ D(R+) and the continuity assumption is removed, Theorem 1 is in 
general untrue. For example, let F(x; \) be a family of normal distributions 
with variance \ and mean g(\), where g(\) is a discontinuous solution of Cauchy’s 
functional equation g(x) + g(y) = g(x + y). Then 


(tl; \) = expfilg(r) — 4d°} 


is not of the form [f(t)]’ although F(x; A) eC, . 

THeoreM 2. Jf p(t; 4) for X ¢ D(R+) is real-valued (for real i), a NSC 
F(x; d) ¢ C, is that p(t; +) = [f()". 

Proor. The set of zeros of @(t; \) is independent of . For if (to; Ai) 
and \» > A, , then 


(2) (to; Ao) = (ty; Ae _ Avob(to; Ay) = (), 


If A; < A; and n isan integer, [(to; Ai/n)|" = o(to; Ax) = O whence d(to; A,/n) = O 
for every positive integer n. But for sufficiently large n, we have A; > A,/n. Ap- 
plying (2), we deduce @(to; As) = O. 

For \ = r, a rational number, we have from Theorem | that ¢(f; r) = [f(t)]’ 
with f(¢) never zero. It follows from the above that $(¢; \) is never zero. Conse- 
quently, the properties of c.f.’s that ¢(0; A) = 1 and that #(¢; A) is continuous 
in t for every \, show that @(/; A) is never negative. 

Now y(t; 4) = log #(t; A) is well defined, and, from the translated form of 
(1), satisfies Cauchy’s functional equation. As 


o(t;r) = lo(t; A)| <1, 


y(t, \) is nonpositive whence the only solution is the continuous one Y(t; A) = 
KA. Thus, for all real \ > 0, 


o(t;) = exp [KA} = [A(t)}. 


Taking \ = 1, we have g(t; 1) = A(t) = f(t), which proves the theorem. 
Derinition. Let \,; be an element of the Abelian semi-group (additive) D; for 
j = 1,2, ++, k. A k-parameter family of ¢.d.f.’s will be said to be additively 
(1) 


closed or to belong to the class C, if for any two members F(z; he. ses  ) 
and F(x; \;, -:: , Xx), 


(3) F(z;\:’,---, Ne?) eP (x: rf”, + AL”) ae P(x: [vr? + AP], + Ne”)). 


There may be a set of dormant parameters which are unaffected by the con- 
volution, but these may simply be ignored. 
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In generalization of Theorem 1, we have: 

THeoreM 3. Let F(x; dy, ++ , Ax) be a k-parameter family of c.df.’s with 
d; € D; where D; = D,(1+,0), Dj(r+,0) or Dj(R+,0). Further, let p(t; 4, - ++ , Nx) 
be continuous in all d; for which the corresponding D; = D,(R+,0). Then a NSC 
that F ¢ C, is that 


O(t; 1,02, °° de) = TT (COP, 
j=l 


where each f ;(t) is a c.f. independent of all d; , and is i.d. providing the correspond- 
ing D; is Dj(r+,0) or Dj(R+,0). 

Proor. As in Theorem 1, ¢(t;0, --- ,0,A;, 0, «++ ,0) = G,(t;d,) = [fF]. 
Hence, 


k k 
P(1; Ar, *°* Ae) = I] G;(t; dj) - I] [f())’. 
= } 


The inclusion of the value zero in each domain D; immediately implies that 
each f;(t) is itself a c.f. The question arises whether this is necessarily so if zero 
is deleted. Provided the product space D, x D, x «++ x Dy is suitably altered, 
the answer is in the negative. 

THeoreM 4. Let F(x; dy, A2) be a two-parameter family of ¢.d.f.’s where dy € 
D(r+) and \» € Dir), with Ay 2S \do| defining the parameter space. A NSC 
that F(a; 1 , x) € Ce is that 


o(t; a, re) = IT Ol”, 
j=l 
where fo(t) is not necessarily a c.f. 
Proor. Since for any positive integer n, 
(p(t; 1/n, 1/n|" = @(t; 1,1) = r(t), (say), 


r(t) is an i.d.c.f., and @(t; p/n, p/n) = [r(t)|”” for any positive integers p and n. 
Similarly, 


[o(t; 1/m, 0))" = p(t; 1,0) = filt), (say), 


where f,(t) is an i.d.e.f. Hence @(t; 1, 0) = [fi(t)J*' for \y e D(r+). Let fp(t) = 
r(t)/f,(t). Then f(t) is defined and nonzero for all real t. 
Now if \» > O and A; = Ae, we have 


b(t; dr, Ae) = UE; ML, M1) = [OP = HOMO. 
If \» > O but Ai # de, 

(ts Ay, Av) = H(L; Ay — Az, O)M(E; Av, Av) = I [f,(t))**. 
Furthermore, 


b(t; Ar, Az)p(t; Ar, —Ae) = H(t; 2A1, 0). 
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Substituting in this last equation and solving, we find 
G(t; dr, —Da) = [ald L(OI™, 
completing the proof. It is clear from the definition that f.(t) need not be a 
ef, 
The following example illustrates Theorem 4. Define ¢,(f) = exp {a;(e" — 1)} 


with a > 0 and a 2 O, and rational for 7 = 1 or 2. Let \y = a + a and 
Ae =a a, with 


(4) (3 1, Ao) = dilt)oo(—t) = [eens 


The parameter space is given by \, ¢ D(r+) and dA, € D(r), with A; 2 |r|. Finally, 
exp{7z sin ¢} cannot be a c.f. as 


(5) exp{isin ¢} = 1 + it + 477 + oft’), 


which would imply that the corresponding r.v. had unit mean and zero variance 
and hence (by the uniqueness theorem for c.f.’s) a c.f. equal to exp {it}. 

The proof of the following generalization of Theorem 4 is very similar and 
will not be given. 

THeorem 5. Let F(x; 1, Ao, +++ 5 Ae) be a k-parameter family of c.d.f.’s, 
where \y € D(r+) and; ¢ D(r+,0), with j 2 2 andr, 2 dx 2 +--+ = Ae defining 
the parameter space. A NSC that F(x; 1, 2, «++ , Ae) € Ce is 


k 
(t; 01,02, °°* Me) = TT GO. 
j=1 
where f(t) is not necessarily a c.f. for j > 1. 
The last two theorems could be extended to real values of \ under suitable 
assumptions. 
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NOTES 


SEQUENTIAL PROCEDURES THAT CONTROL THE INDIVIDUAL 
PROBABILITIES OF COMING TO THE VARIOUS DECISIONS 


By Lionet WEIss 
University of Virginia 


1. Summary. We consider cases where we have a finite number of decisions 
and a finite number of possible distributions, and we confine attention to pro- 
cedures which have zero probability of continuing beyond the Nth observation, 
where N is a given positive integer. We find a class C of procedures such that 
given any procedure FR, there is a member of C, say R’, such that the probabilities 
of coming to the various decisions under the various distributions when using 
R’ are at least as desirable as when using F, and such that we are at least as 
likely to take fewer than n observations under R’ as under R, for any n. Various 
extensions are indicated. 


2. Introduction. Since the discussion to follow leans very heavily on the 
results of a previous paper [1], we summarize briefly these results. 

Let x be the generic point of a Euclidean space Z, and F;(x), --- , F(a) be 
m given cumulative probability distributions on Z. The statistician is confronted 
with an observation on the chance variable X which is distributed in Z according 
to an unknown one of Ff; , --- , F,,. On the basis of this observation he has 
to make one of L decisions, say d,;, --- , d, . Let s be a positive integer and 
W in(z), fora =1,°--+,m; 97 = 1,°---, Lb; k = 1, «+--+, 8, be measurable 


functions of x such that / W ijx(x)| dF (xz) << @, 
Zz 


A randomized decision function, often called ‘‘test’”’ for short, and generically 
designated by n(x), is defined as n(x) = [m(x), --- , ni (x)], where 

(a) n(x) is defined for all z; 

(b) O S n;(x) forj = 1,--- , L; 

(c) 2 nj(x) = 1 identically in 2; 

(d) n(x) is measurable for j = 1, --- , L. 

Let 


ru = [ (ZX nlaWeula)) ara), 


i=l 


r° = (ry), pes ies 
Thus to each n(x) there corresponds the sth order risk point r*. The test 7’ with 
sth order risk point r* will be said to be wniformly better (s) than the test 7” 
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with sth order risk point r” = (rx) if re < rx for every i and k, with the in- 
equality sign holding for at least one pair (7, k). A test T will be called admissible 
(s) if there exists no test uniformly better (s) than 7. A class Cy of tests will be 
called complete (s) if, for any test 7’ not in Cy , there exists a test T in Cy which 
is uniformly better (s) than 7” 

Any set & = (£%), fori = 1,--- ,m,andk = 1, --- , 8, of nonnegative num- 
bers which add to unity (a convenient normalization) will be called an a priori 
distribution (8s). A Bayes’ solution (s) with respect to & is a test 7’) which mini- 
mizes >> ix Euru(7’) with respect to all tests 7. 

Let f; be the density function of F; with respect to a measure yp, with respect 
to which all F; are absolutely continuous. There is always such a measure. To 
construct an sth order Bayes’ solution with respect to £ = (&) one may pro- 
ceed as follows: n;(xz) = 0 for all j = 1, --+ , L for which 


Do de Ein f(x) Win (2) 

im] kewl 
is not a minimum with respect to j; n;(x) is defined arbitrarily between 0 and 
1, inclusive, for all other j, provided only that every component of the resulting 
n(x) is measurable and the sum is always 1. 

The only other result of the previous paper [1| needed for this one is the com- 
plete class theorem: Every admissible (s) test is a Bayes’ solution (s) with respect 
to some a priori distribution (s). Hence the class of Bayes’ solutions (s) is complete 
(8). 

The present paper deals with the case where there is a finite number of terminal 
decisions, d; , --- , d, , one of which must be chosen, and a finite number of 
possible distributions for the vector chance variable X = (X,--- Xy). We 
observe X sequentially, that is first we observe X,, then decide whether to 
observe X, or choose a terminal decision on the basis of X, alone, etc. Each 
component X; of X may itself be a vector chance variable, and X,,--- , X» 
are not necessarily independent. For each distribution F(x), we assume that 
the terminal decisions are divided into two mutually exclusive, exhaustive, and 
nonempty classes, one of ‘favorable’ decisions (those we prefer to make when 
F (x) is the true distribution) and the rest “‘unfavorable” 

At various points in the discussion to follow, certain functions are well-defined 
except on sets of measure zero. The existence of these exceptional sets is of no 
consequence, and will not be specifically mentioned. As usual, P(A | B) will 
denote the conditional probability of A given B. 


3. A class of optimum decision procedures. Suppose that we are €. amining 
some specific decision procedure R. Once R tells us to stop sampling after ob- 
serving X,, & chooses a terminal decision in some way. We replace this way 
by a Bayes’ solution (L) to the decision problem, where the space is the set of 
all points (a --- The possible distributions F;, --- , F,, are such that, 


over any Borel set S in the space of (a, --- z,), the integrs al fai IF; is the 
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probability that sampling stops at a point of S, using R and F; true, given 
that sampling stops with 2, . The decisions are d,,--- , d, . The funetion 
W ijx(a. +++ 2,) 18 0 if d, is favorable relative to F, andj = &k, or if d, is unfavor- 
able and j # k; it is 1 otherwise, that is if d; is favorable and j # k, or if d is 
unfavorable and j = k. The complete class theorem quoted above shows that 
this Bayes’ solution (1) can be so chosen that the probabilities of all the un- 
favorable decisions are no greater than they were when using 2, and the proba- 
bilities of all the favorable decisions are at least as great as they were when using 
R, under all distributions. 

We note that we have not yet changed the stopping rule, which is still given 
by the originally specified procedure R. We also note that the a priori distribu- 
tions (1) will in general be different for different n. Let this a priori distribution 
(L) as a function of n be denoted by »£ = (nu) fori = 1,--:, mandk = 
1, --- , L. We denote by 

nf (a, +--+ 2.) the marginal cumulative distribution function given by 
F(a, +++ ty) for (X1«°-~ Xa); 

nfi(%, +++ X_) the density function of ,/;(a, --- x,) with respect to a measure 
u, With respect to which all ,/’; are absolutely continuous; and 

s(x, --+ 2,) the probability under the stopping rule given by RF that sampling 
will be stopped exactly after observing X, when the first n observed values are 
(2, °** Ba). 

Then we have that the density function for F (a, +--+ 2a) is 
s(ay +++ Xn) nfilay +++ tn) 


[ +++ tn) nfilay +++ tn) dy 


If the denominator of this expression is zero for some value of i, then we con- 
sider that in this particular problem of choosing a terminal decision, F; is not 
one of the possible distributions, and we modify our class of possible decision 
procedures accordingly. Now we note that according to the explicit construction 
of a Bayes’ solution (L) given in Section 2, the Bayes’ solution (L) with respect 
to ,¢ that we are using is identical with a Bayes’ solution (L) to the problem 
with the same decisions and the same functions W(x), but with distributions 
given by the density functions nf; , --- , nfm, (leaving out any corresponding 
to zero denominators in the fraction above) and with a priori distribution ,£’ = 
(ka ), where 


[ se oo* Za) nf (x1 or 


C being a normalizing constant, and it being understood that any fraction with 
a zero denominator is to be zero. Below we shall modify our stopping rule, but 
until further notice, whenever it is decided to stop sampling with YX, under 
any stopping rule, we make our terminal decision by means of the Bayes’ solu- 
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tion (L) with respect to the a priori distribution ,£’, where the distributions are 
given by the densities ,f; , --- , afm, just described. 

Now we construct a (possibly) new stopping rule. For a given n between 1 
and N — 1 inclusive, to decide whether or not to stop after observing X,, , let 
us assume that we have already described a rule for stopping after observing 
Xnyi, and Xn42,-°-* , and Xy_,, while we use the stopping rule given by R 
before we reach X, . We look upon the problem of deciding whether to con- 
tinue sampling after observing (x; --- z,) as a decision problem with two pos- 
sible decisions: D,; , stop sampling; and D, , continue sampling. 

We apply the complete class theorem quoted above to this case with the de- 
cisions D, and D,. The distributions, given by the density functions 
nfi(ar coe Ba), oe? , nf (21 +++ 2»), are defined as follows. Let t(a; --- 2,) denote 
the probability that sampling will not be stopped before observing X, when the 
procedure R is used and the first n observed values are (x; --- x,). Then set 


t(ay +++ an) nfilay +++ tn) 


[ *** Ln) nf (ay °° Za) du 


riilty +++ 2.) = 


The functions Wij, = Wile, --+ 2,), fori = 1, --- yi -and k = 
l,---,N —n-+ L, are defined as follows: 
fork = l, 


Was = 0 2 all z, all (a +--+ an); 


fork =2,---,N—n 


’ 


Wau = 0 for all 7, all (x, --- 
W ix = P{not stopping before X,., under F; | (2 --+ 2,)]; 
fork = N—-n+1,°--,N—n+UL, 


Wii = Plafter D; is made of {"**} making dk_~4, under F; | (x; --- Xn)], 


if di_wan iS fun} favorable relative to F; . 


All the conditional probabilities introduced here in defining W;;(x) are well- 
defined, since we assumed that we have already decided how to proceed once 
we decide either to stop after observing X, or to continue sampling after X,, . 
By the complete class theorem quoted above, there is an a priori distribution 
(N — n + L) for this two-decision problem, with a Bayes’ solution (NV — n + L) 
no worse than the solution given by the procedure R with respect to the proba- 
bilities of making favorable or unfavorable terminal decisions and with respect 
to the distribution of the sample size required to come to a decision, under 
any of the possible distributions. 

We apply this construction first to the problem of deciding whether to stop 
after observing Xy_, . Let y.& be the a priori distribution (L + 1) to be used. 
We define an a priori distribution yf’ and a Bayes’ solution for it in terms of 
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vit, just as we defined y_,é’ in terms of y_s¢ above, using ((x; --- ry_,) instead 
of s(a, --- ty_,). Then we agree that no matter how the stopping rule may be 
modified before reaching Xy_; , we will decide whether to continue sampling 
after reaching X y_, by using this Bayes’ solution (L + 1) with respect to yl’. 
Here the decisions are D, and D, , the distributions are given by the densities 
woifi(ti +++ Bw), +++, weafm(%1 +++ Zw), and Wy,(x) are as given above. 
Now we apply the construction to the problem of deciding whether to stop sam- 
pling after observing Xy_» , carrying out similar steps and gétting an a priori 
distribution y_2é’, etc., down to deciding whether to stop sampling after ob- 
serving X, , getting an a priori distribution ,¢’. 

The net result of all this is to give us a decision procedure R’ which uses 
higher order Bayes’ solutions to decide whether to continue sampling at each 
stage, and also to choose a terminal decision after it is decided to stop sam- 
pling. This R’ is such that for any distribution F; , the probability of making 
any given favorable decision is at least as high using RF’ as using R, and the 
probability of making any particular unfavorable decision is no higher under 
R’ than under R. Also, under any F; , the cumulative distribution function of 
the sample size required to come to a terminal decision when using F’ is never 
below the corresponding function when using R. 

Now let C be the class of decision procedures we get by varying all the various 
a priori distributions used in defining R’ (both those used in setting the stopping 
rule and those used in choosing terminal decisions) in all possible ways, modi- 
fying the Bayes’ solutions accordingly. Since R was arbitrary, we have the 


theorem that for any decision procedure R there is a member of C, say R’ 
enjoying the same advantages over R as were described in the preceding para- 
graph. 


’ 


4. Extensions. The individual probabilities of making terminal decision 
d; under distribution F; after no more than n observations can be controlled 
for all 7, 7, and in a manner similar to that described above, by defining the 
proper set of functions W,;,(x) at each stage. How this would be done is clear, 
and is not elaborated here. 

Obvious analogues to the above results hold for cases, such as two-sample 
problems, where the size of the second sample is a bounded function of the 
observations in the first sample. Choice of the size of the second sample is treated 
as a decision problem, just as whether to continue sampling was treated as a 
decision problem above. 

The results of [2] can be extended to the present case, so that when we have 
atomless distributions our class C above need not contain procedures employing 
randomization. 

Under certain conditions, a case where there is an infinite number of distri- 
butions and/or decisions and/or possible sample sizes can be approximated 
arbitrarily closely by a case where these three numbers are finite [3]. 

Finally, cases in which it is desired to control not the whole distribution of 
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the sample size required to come to a terminal decision, but only certain aspects 
of it (for example, its expected value), can be handled as above, using the proper 
W in(x) at each stage. 
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ON A CHARACTERISATION OF THE GAMMA DISTRIBUTION 


By R. G. Lana 
Indian Statistical Institute 


An intrinsic property of the gamma distribution, as proved by Pitman [1], is 
that if X,,X,--- , X, are n identically distributed independent gamma variates 
with the distribution function 


l -x 
./. é 
I'(p) 


dF(X) = goat Os X Ss ~) 
then the sum X, + X. + --- + X, is distributed independently of any function 
g(X1, X2,°-- , Xn) satisfying g(X1, X2,--- , Xn) = g(AX1, AX2,--- , AXa) 
for any nonzero real \. That is, g(X,, X2, --- , X,) should be a function inde- 
pendent of scale. In the present paper the converse theorem is proved for a 
particular class of g-function. 

TuHroreM. Let X; , X2, +--+ , Xn be n identically distributed independent random 
variables with a finite second moment. If the conditional expectation of the ratio of 
two quadratic forms (>a, ;X Xj) (>> x,)’, (where the elements of the matrix (a,j) 
satisfy the relation dai: xf > ai;/n ) for fixed sum X, + Xo +--+ + X,, be equal 
to its unconditional expectation, then each X follows the gamma distribution. 

For a matrix A = (a;;) where the relation Yai = dai;/n holds, the method 
suggested does not lead to any solution of the problem. It is also interesting to 
note in this connection that the stronger assumption of stochastic independence 
of the sum X, + X, + --: + X, and g(X,, X.,--- X,) is not necessary for 
this particular class of g-function. 

The following lemma is required for the proof of the Theorem. 

Lemma. If u and v are two random variables such that for fixed v, the conditional 
expectation of u/f(v), where f(v) is a function of v, is equal to its unconditional ex- 
pectation (provided it exists), then 

E(u”) = Eju/f(v)}-Ei five}. 


Received 9/14/53, revised 5/28/54. 
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The proof of this lemma is very simple. If zx and y are two variates such that 
the conditional expectation of y for fixed x is equal to its unconditional expecta- 
tion, then E,(y) = E(y). Multiplying both sides by ¢(x) e“* and taking expecta- 
tion with respect to x, we get very easily 


Ely o(x)e"*} = E(y)-E{g(z) e*}. 
Putting y = u/f(v), ¢(x) = flv), «x = v, this becomes 
E{(u/f(v) fre} = Efu/f(v)}-E {five}. 
To prove the lemma, this may be written as 
E{ue""} = E{u/f(v)}-E{foe“}. 


Proor or THEOREM. Using this lemma with u = }oa;;X;.X;, v = )>X;, and 
flv) = (D-X),), 


E{(30a,;X:X je '**** Se) 
= E{(doai;XiX j)/(2X0)"} -E{ (LX, ett te), 


Let g(t) = E(e’”) represent the characteristic function of the distribution of XY. 
After some algebraic simplifications, (1) will reduce to 


2 f 2 
(Sia,) - 2% eg + (Taz) - (*) grt 


dt? r igh) dt 


(1) 


atx 


(2) 


2 2 \ 
= K{n- FP. ot + nin — 1) - (#2) -¢” y 


dt 


| @& 
where K = E {(a,j;X Xj) (>°>X,)"}. Then, we have 


(Z au) «(G2 /e) + Las) «(4 /e) 
dt ix) dt 
= K " . (Gr /*) + n(n —_ 1) ° (F/*) }. 


Writing y(t) = In g(t), we have 


dy at deg : dy a of ve (¢ ) 
aa/i* @# @/sl* \a@/"%]° 


Substituting these in (3), we obtain the following differential equation for y(t), 


: dy -Y: dy\’ w A= > aK — nk, 
(4) 4 dt? +* (FF) 0, (4 = > ay — n’K, 


(3) 


together with the initial conditions 


dy dy 


= = 1m ; ~ 
dt |\tmo 7 dt’ | seo 
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Here m and o’ are respectively the mean and variance of the distribution of X. 
In the solution of this differential equation (4), three cases must be distinguished. 
I. A #0, B # 0; 
IT. A #0, B=0; 


IT]. A =0, B #0. 


For Case I, the differential equation may be written as 


ry dy\’ 
(5 : = ( ve ( 2) Y = 
») dt? dt /’ ( 


using the initial condition in (4). Writing &(t) = dy/dt equation (5) reduces to 


Recs P 
(6 : ) sat a 
”) dt (5 m* 


Integrating this differential equation (6) with respect to t, using the initial con- 
dition &(0) = im, we get 


um 
— (o?/m)it’ 


From (7), with the initial condition y(0) = 0, we get very easily 


(7) — ort) = 7 


(8) w(t) = —(m’/o’) log [1 — (e’/m)ii), or g(t) = [1 — (0 /m)it\"“” " 


By applying the inversion theorem, it can be very easily shown that the charac- 

teristic function g(t) in (8) leads uniquely to the gamma distribution with param- 
eters a = m/o and B = m’/o’, the frequency function being given by 

((1/T'(B) Ja’ +e ** XP" X>0\ m>0; 
ee X 50) 

r 0 X 2 0\ m <Q. 
y r(8)|(—a)*e"* (— xX)? X <0; 

For cases II and III, it follows from the conditions stated in the theorem that 


(10) E {ae X; Xj ae o> a4; + m Dai; ; 


Ox } no* + nm? 


Thus the condition B = 0 yields the relation 


(11) dai; st B {2 X;X;\ “ o> aii + mda; 
n* 


(>> X,)* J no? + n?m? 
On simplification, this reduces to }\a;; = > a;;/n. Similarly, in Case III, the 
condition A = 0 obviously leads to the relation 


(12) di = K a TU tm Qiay 


no? + n*m* 
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On simplification, this also reduces to ai: = >a;;/n, the same as obtained from 
the condition B = 0. Thus an important conclusion is reached that whenever 
the matrix A = (a,;) is such that its elements satisfy the relation Yai = > ay;/n 
both the coefficients A and B of the differential equation (4) vanish simultane- 
ously, thus leading to no solution of the problem. 

Since cases II and III are excluded by our assumption )>a;; * )-a,;/n, the 
problem leads uniquely to the solution obtained in (9). Obviously when the 
matrix A = (a;;) is either positive definite or negative definite, the relation 
Dai: Foal dai; n is always satisfied. Thus the equality Dai = doa;;/n may hold 
only for some indefinite matrices. 

Coro.uary. Let X;, Xo, --+ , X» be identically distributed independent random 
variables with a finite second moment. If the ratio of the linear functions of random 
variables given by (a,X, + +++ + anXn)/(Xi + +++ + Xq) is distributed inde- 
pendently of the sum X, + Xo + +++ + X, then each X will follow a gamma dis- 
tribution. 

Proor. From the statement above, it follows that the conditional expectation 
of (aX, + --- + a,X,)°/(X, + --- + X,)° for the fixed sum X, + --- + X, 
is equal to its unconditional expectation. Here the elements of the matrix A are 
given by aj; = a,a; for i,j = 1,2,--- ,n and they always satisfy the Schwartz’s 
inequality >a‘ > (Sa,)*/n, excluding the trivial case dai == (S<a,)?/n which is 
possible when and ‘only when all a,’s are equal, thus reducing the ratio of the 


linear functions to a constant. Hence the relation )-ay ~ >-a;;/n is always 
satisfied and the proof follows at once. 
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MATCHING IN PAIRED COMPARISONS 
By J. L. Hopees, Jr. anp E. L. Leamann' 
University of California, Berkeley 
1. One of the simplest designs for testing the effect of a treatment is the 
method of paired comparisons: 2n subjects are divided into n pairs, and within 
each pair the treatment is assigned at random to one of the two subjects while 
the other is used as a control. This method has the reputation of being most 
effective if the subjects within each pair are as closely matched as possible. 
We shall show below that while this is true in the situations occurring most 
commonly in practice, it is not correct universally. 


Received 11/2/53. 
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We are interested in the power of the one-sided sign test for testing the hy- 
pothesis H of no effect against the simple alternative K that the treatment has 
a specified positive effect. 

Consider now a possible pair of subjects and assume the usual model: the 
score of A, B is composed of a true value a, b, an error term U, V and, in case 
the treatment is applied and is effective, a treatment effect ¢. Then if X and 
Y are the scores of A-and B, respectively, we have underH: X =a+ U,Y = 
b + V, while under K the quantity ¢ is added to the score of the treated subject. 
We assume that U and V are identically and independently distributed ac- 
cording to a continuous distribution F, and denote by G the distribution of 
V — U. Then if the treatment is applied to A or B, with probability 4 each, 
the probability that the score of the treated subject exceeds that of the un- 
treated one is 4} under H, and 


HG(t + A) + Git — A)] 


under K, where A = b — a. Without loss of generality, A may be taken as non- 
negative. 

If A and B are perfectly matched, then A = 0 and the probability that the 
treated subject has the greater score becomes G(t). Perfect matching can there- 
fore be guaranteed to give the highest power against all alternatives if and only 


if 
(1) MiG(t + A) + Git — A)| Ss Git) for all ¢ => O, all A. 


This condition clearly implies that G(u) is concave for u 2 0: that the converse 
is also true is at once obvious for A S t. Fort < A S 2, note that the values 
of G involved in (1) are unaltered if in the interval [t — A, A — ¢] the function 
G is replaced by its chord. The resulting curve is concave to the right of t — A 
and (1) follows. Finally, for A > 2¢, we note that (1) is equivalent to 


(2) G(A + t) — G(A — t) s Git) — G(-t) for allt = 0, A 2 O. 


This time replace G by its chord in the interval [—1, ¢], to establish (1). 

Matters simplify if we assume that G has a density g. Then the convexity of 
G is equivalent to the requirement that the symmetrical function g(u) be a 
decreasing for u 2 0, and hence unimodal (with mode 0). In summary, a neces- 
sary and sufficient condition for perfect matching to give always the greatest 
power is that the density g be unimodal. 

It is clear that there are distributions / of the error U for which this condition 
holds. The normal case is an example, since then G is again a normal distribution. 
However, it is also easy to give examples for which the condition is not satis- 
fied. Let F be uniformly distributed over the union of the intervals (0, 1) and 
(4, 5). Then g(u) = 0 for 1 < |u| < 3 and is positive for 3 < |u| < 5. In this 
extreme example the gain in power may be considerable. We have G(1) = 
G(3) = 34 and G(5) = 1. With ¢ = 3 the probability that the treated subject 
exceeds the untreated one is 94 when A = 0 and 7¢ when A = 2. If we use 10 
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pairs and consider the treatment as significant when the response of the treated 
subject is higher in eight or more pairs, the significance level is .055. The power 
against a treatment-effect ¢ = 3 is then only .526 when identical subjects are 
paired but rises to .880 when the subjects in each pair have a response differ- 
ence A = 2. Thus, for certain error distributions and sizes of treatment effects, 
it is possible to improve the power of the test substantially by purposely mis- 
matching.” 

It appears that to use the possibility of improving the power (when it exists), 
one must know the distribution G. But if G were known, one could obtain a 
more powerful test based on the differences themselves, instead of just on the 
signs of differences. This is the very common difficulty, that the choice of an 
optimum design depends on knowledge which a priori was assumed unavailable. 
However, while values of nuisance parameters, form of distributions, etc., 
frequently are not sufficiently well known for the validity of the test to depend 
on this knowledge, one does have some idea about them, which may be utilized 
in the design of the experiment. The statistical procedure then will be valid, 
whether one’s ideas are correct or not. Only the sensitivity of the experiment 
will be affected by the accuracy of these ideas. 

In the next section we shall show that g is unimodal whenever F has a uni- 
modal density, and this is the case in most applications. However, bimodal er- 
ror distributions do occur, particularly when there is the possibility of “gross 
error.”’ In such cases mismatching may increase the power of the test. 

2. The purpose of this section is to prove that the difference of two independent 
observations on a unimodal random variable has also a unimodal distribution. 
We note that the same is not true of the sum, as has been pointed out by Chung 
[1], who gives a counter example. It is also easy to see that our condition is 
not a necessary one by considering 


P(X = 1) = Wand P(X =0) = P(X = 2) = %. 


DeFINiITION. We say that a random variable X is unimodal with mode m 
(a) in the discrete case, if the possible values of X are equally spaced numbers 
m,m + A, m + 2A, -:- , and 


< P(X = m — 2A) Ss P(X =m — A) S P(X = m) 


= P(X =m-+ A) = P(X = m+ 2A) 2 


’ 
(b) or, in the continuous case, if X has a density function f which is increasing 
for « < m and decreasing for x > m. 
We shall need the following inequality. 


THEeoreM 1. Let (a, , a2, -** , Gn) be a sequence of real numbers satisfying 


(3) OSM 3°*' 3S Qn S&S Onn & °°: 4,2 0 


2 It should, however, be pointed out that the corresponding possibility does not exist if 
one is interested in a point estimate of the treatment effect. 
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for some 11 S&S m Sn. Let Se = aGyQiar + Godese + -°* + GpxO, for 
k= 0, ‘. coe = a, Then So = S; a some = S, . 


Proor. Fix k 2 0 and prove S = Sy4; forn = k + 2. Forn = k + 2 our 
proposition becomes 
AjAp 41 + Aedes S AA 42 


’ 


which is easily verified: ayay,, 2 aay42 unless ap4; < dese, in which case a; S 


*? 


dz and A0x42 2S GyQr42. We induct on n. Let there be given any se- 
quence (a,, °°: , a,) satisfying (3), with n > k + 2. We may assume m > 
1 + k, since otherwise we have easily 


St — Skar = Qi(Qige — Goge) Hoes H+ Gn e-nlQniu — An) + Gnd, 2 0. 
Since we also have 
Si — Skar = Qidirnt + (a2 — Q;)do4k + eee + (dat — On—k-1) On ’ 


the theorem is obvious if m = n — k. We therefore now assume 1 + k < m < 
n—k. 

Let us consider the sequence (a; , --+ , @m—1, @m41, °** , @n) Obtained from the 
given sequence by dropping a,, , and let S’ denote the sums of products for the 
new sequence. Note that the new sequence also satisfies (3). We have 

Se = (Gidiye Hoes H Amped) + (Gm—kOmgr Ho 8 * Ht Am—1Om4k) 

+ (Am+10m4+1+k + a + On—kAy) 
= Se + (Gm—edmar °° * + Om1Omyk) — (Am—-kOm + °** + OmOm+e)- 
Sku = Seq tb (Gm—k-10 m41 ie Om—14 m+k+1) — (Qm—e—18m + °° * +OmOm4441)- 
When these are differenced we have, transferring the term a,,—%14» , 
Si — Sega = (Se — Sega) + [(Gm—e—14 m44 + °°? + Om—10m4e41) 
~— (Cn—p—18m + Om—kOm+1 + eo, + Am—18m+k) | 
+ [(Gm—km + eae + AmOm+k) 
— (Om—kOm+1 + i ih + OmOm+k+1)| 
CS; — Set) + [m—k (Gm — Om+1) + —e + Om\Om+k or Om+k+1)| 
— |dm—e—1(Am — Om4i) + °° * + Om—i(Gm+k — Om+e+1)] 


(Si; = Sx4s) + [(Qm—x — Om—k—1)\dm — Om+i) 


+ +++ 2(dm — Gm—1)(Am+i — Om+e41)]. 


By the induction hypothesis, S; — S41 is nonnegative; by the unimodality 
assumption the term in square brackets is a sum of products of nonnegative 
terms. We conclude S, 2 Sx4:. 
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We can now establish the desired result. 

THeorem 2. /f X and Y are independent observations on the same unimodal 
random variable, then X — Y is unimodal. 

We prove the theorem in three parts. 

Parr I. If X has as possible values only finitely many integers, the theorem is 
an immediate consequence of the preceding one. The a’s are taken to be the 
probabilities of the successive possible values of X. Since P(X — Y = k) = & 
for k a positive integer, and since X — Y has a distribution symmetric about 0, 
the theorem follows. 

Parr II. Let the possible values of X now be numbers of the form rA, where 
A > 0 and r is any integer. For simplicity we may assume 0 to be a mode. For 
every positive integer s, define 


(X if |X| Ss, 5 (Yif|Y| ss 


\0 if |X| > s, "0 if |¥| > s. 
That X; — Y. has a unimodal distribution is an immediate consequence of 
Part I. But since P(X, — Y, # X — Y) -+0 ass — o, we see that X — Y 
must also have a unimodal! distribution. 

Part III. Now suppose X has a density f, with mode at m. For each positive 
integer s, define 


Xi = [(X — m) Vs]/vs, 
where |] denotes the greatest integer less than u. The cumulative distribution G) 
of X7 — YY cannot ever differ from G by more than a quantity which tends to 
0 as s — «©. However, G. is unimodal, by Part II. If G were not unimodal, we 
could find « > 0, A > 0, and u — A > O such that Gu — A) + Glu + A) + 
¢ < 2G(u), which would yield a contradiction. 
REFERENCE 
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NOTE ON A THEOREM OF LIONEL WEISS' 
By Lucien LeCam 
University of California, Berkeley 


1. Introduction. In a recent paper [1] it was pointed out by Lionel Weiss that 
the class of sequential probability ratio tests is complete in a very strong sense. 
The purpose of the present note is to show how this result can be derived from a 


Received 2/8/54. 
' This paper was prepared with the partial support of the Office of Naval Research. 





792 LUCIEN LECAM 


slight extension of the usual theorems of decision theory, and to generalize this 
result to the case where the number of alternatives is any finite number. Similar 
results could also be obtained in more general cases. 


2. Statement of the problem. Let{|X,}, for n = 1, 2, --- , be a sequence of 
mutually independent random vectors. Assume that the distribution of these 
vectors is given by a sequence of probability measures {7,,;}, form = 1,2, ---., 
where 7 is an index taking one of the values j = 1, 2, --- , k. Suppose that the 
loss incurred by accepting j while 7 is true is a finite number W;; , strictly positive 
if 7 # j and equal to zero if i = j. 

Let W be the class of matrices (W;;) satisfying these conditions. If 7 is the true 
state of nature, we will assume that the cost of taking n observations is C,(n), 
nonnegative strictly increasing in n and such that C,(n) tends to infinity as n 
tends to infinity. Let @ be the class of all k-tuples of cost functions C = {C,(n)} 
satisfying the preceding conditions. Let 0 = J x W x @, where J denotes the 
set of integers J = {1, 2,---, k}. For a particular decision function 6 and a 
particular point 6 = {1, W, C} «6, let R(6, 6) denote the risk if 6 is used for the 
state of nature 7, the loss function W, and the cost function C. 

If D is any subset of the set D of all measurable decision procedures such that 

(1) D contains all 6 ¢ D which minimize linear combinations of the form 


K(u, 6) = po ue K (6, 6) we >0; i a= 
t=1 t=1 


(2) D is compact in the sense defined in [2]; 

Then it follows from a modification ({2] Theorem (5)) of a theorem of Wald 
((3| Theorem 3.18)) that D is essentially complete. This means that whatever 
by € D, there exists 6, ¢ D such that R(6, 6;) S R(0, 69) for every @¢ 0. Such an 
essentially complete class is described below. 


3. Description of the complete class. Let A be the set of probability distribu- 
tions on J. Let Z, be the vector Z, = {X,--- X,} and let q(Zn.») = {q;(Zn.»)} 
be the vector representing the a posteriori distribution of 7 ¢ J given Z, , when 
the a priori distribution of 7 ¢ J is p. Let p be fixed. Consider the class D of all 
decision functions defined in the following way. For each n = 0 choose k closed 
convex sets |S,.;| with j = 1, 2,--- , k, each contained in A, with disjoint in- 
teriors and such that S,; contains q if q = {qi} with q; = 1. 

The decision function 6 consists of the following rule: if q(Z,.») ¢€ Sn.;, then 
stop and accept j; if q(Z,.») is not a member of U;S, ;, then take one more 
observation; if g(Z,.») is a limit point of one or many S,,; , randomize appropri- 
ately. 

It is clear that the preceding description uses characteristics not depending 
explicitly on p, so that p may be fixed and, for instance, taken equal to the uni- 
form distribution on J. For this class D the following theorem holds. 

Tueorem. There exists on D a topology 3 for which (1) R(@, 5) is lower semi- 
continuous in 6 for each 6 & 8 and (2) D is compact and D is a closed, compact, sub- 
set of D. 
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If 59 is any measurable decision function 6 € D, there exists a 6, ¢€ D such that 
R(@, 50) S R(O, 6), whatever may be 6 € Z. 

Proor. A topology 3 having the desired properties has been defined more 
generally [2] by a process analogous to the one used by Wald [3] for the defini- 
tion of regular convergence. A classical theorem (see [4], Vol. 1, p. 246; Vol. 2, 
p. 21; or [5|) states that the space of closed subsets of a compact metric space is 
compact for the usual definition of distance between sets. It then follows from 
the relationship between compactness in this sense and compactness in the sense 
of 3 (or of regular convergence in [3]) that D is compact. This proves the first 
part of the theorem. 

The second part is an immediate consequence of Theorem (5) in [2], provided 
we can show that Bayes’ solutions belong to D. To show this, let P;;(6) be the 
probability of accepting j if 7 is true and 6 is used, and let Q,(n, 6) be the 
probability of taking at least n observations if 7 is true and 4 is used. Let 6 = 
fi, W, C}. Then 


RA(W, C, 8) = RO, 8) = DL Wy PO) + DL Cin)lQdn, 8) — Qn +1, 4). 
i n=0 


Consequently, 


m 


DX u; RO, 8) = XL psRA(W, C, 8) 


f= 


t jal 


for suitable values of W, C, and {p,}. Therefore the Bayes’ solutions for our prob- 


lem have the same structure as the Bayes’ solutions for the now classical problem 
in which W and C are fixed. A very slight modification of the argument given by 
Arrow, Blackwell, and Girshick [6] yields the desired result. This completes the 
proof of the theorem. 

As a particular case, if 69 ¢ D is such that lim,..Qi(n, 69) = 0 for ie J, the 
preceding theorem implies that there exists 6, ¢ D satisfying: 


Pis(60) 2 Pij(b), foreveryi,jeJ; ij; 


Qi(n, bo) = Qin, 61), for every ice J; n 2 O. 


If, furthermore, the probabilities {,,;} satisfy the condition imposed by Weiss 
[1], the boundaries of the S,,; have measure zero and randomization is unncessary. 


4. Remarks. 

(1) the technique of enlarging the space of strategies of nature, say 2, to a 
product 2 x S with S finite has been used systematically by Weiss [7], [8] and 
Lindley [9]. A more general type of extension is implicitly contained in the as- 
sumptions of [2]. The standard form of Bayes’ solutions given by Theorem 4.7 
of [3], or its generalization, remains usually valid under such modifications of Q. 

(2) The proofs of the optimum character of the sequential probability ratio 
test given in [6] or [10] also make use of classes of Bayes’ solutions obtained by 
varying W and C. However, in these proofs C remains proportional to a given 
C, . This is not sufficient for the present purpose. 
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THE DISTRIBUTION OF DISTANCE IN A HYPERSPHERE 
By R. D. Lorp 


The Royal Technical College, Glasgow 


1. In a note with the above title, Hammersley [2} has used ad hoc methods 
to deal with the distribution of the distance AB, when A and B are points uni- 
formly distributed in a sphere of radius a@ in s dimensions. I show here how this 
question may be treated by general methods which I have developed elsewhere 
[3] for random vectors with spherical distributions. A random vector r will be 


said to have a spherical distribution if its probability function is a function of 


lr| only. 

I start with the observation that the problem is in fact one of the addition of 
independent random vectors with spherical distributions. We require the dis- 
tribution of r, — ro where r; and r. are random vectors with the same uniform 
spherical distribution. But on account of the spherical symmetry, —r, has the 
same distribution as r. , so that the problem is equivalent to finding the distribu 
tion of r, + r.. It will be dealt with in this form in what follows. 

2. The first method uses the polar form of the characteristic function. For any 
spherical distribution in s dimensions let 


P(r) dr = Pri{r < |r| < r + dr}. 
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The characteristic function of the distribution of r is E(e'’®). On changing to 
polar coordinates it is found (as in [1] or [3]) to be a function of p = |g| only, 
and is 


a 


(1) P(p) = | P(r) Ayes (rp) dr, 


“0 


where 


A.(z) = T(a + 1)($x) “J (x) 
9 2 + 
(2) (42) (42) 


~ Tat * 12+ l@ + 2) 
with inversion formula 


(3) P(r) = 2°°* (r(4s)}* | (rp) sj2-1(r 0) ®(p) do. 

It should be emphasised that (p) is the characteristic function of the s-dimen- 

sional distribution of r and not of the one-dimensional distribution of r = jr}. 
For a distribution uniform in a sphere of radius a 


(tml ame 

sr a 
(4) P(r) =< , 
\0, 


(5) P(p) = A,je(ap). 
Multiplying characteristic functions and inverting, we obtain the probability 
function for r, + f as 


P(r) = s(s/2 + 1)(2r/a’)*” | 9 oJ s(ap)J o/s i(rp) dp. 
Jo 
This integral is not completely evaluated by Watson [4], but we merely need to 
make simple substitutions (in line 6 of sec. 13.46 and in equation (2) of sec. 13.4). 
The result is 


sI'(48 + ]) — . : » 
Pilp) oe ers rae cos 4¢ dd, 
r'($s + 4) (4) A 


where 0 S A S wandsin 4A = r/2a. Putting t = cos’ 4¢, we obtain Hammers- 
ley’s form 


(6) P.(r) = sr” 'a*I,(48 + 4, 4), 


where » = 1 — r'/4a’ and J,(p, q) is the incomplete Beta function defined by 


ez 


Bp, DIAp,q@ = t? "(1 — t)* dt. 


“0 


3. In the second method the distributions are treated as projections of spher- 
ical distributions in a space of a higher number of dimensions. It is clear that 
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from a spherical distribution of random vectors with O for center (i.e. O is the 
point r = 0) we obtain another spherical distribution with center O if we project 
the vectors orthogonally onto a space of lower dimensions through O. A simple 
calculation [3] shows that if any spherical distribution in space of (s + 2m) di- 
mensions is projected onto a space of s dimensions, the corresponding probability 
functions P“**” (r) and P(r), satisfy 


) ( « & 
(7) P(r) = 2 ae + m) 


1 (#+2m) 2 2\ m—1 ,—8—2m+2 
P (Ot — t it. 
I'(4s)l'(m) I ' ; "f 


If the distribution in the higher space is uniform over the surface of a sphere of 
radius a, then 


-sS 


(8) 
> 


When m = 1, this reduces to (4). 

This shows that a uniform distribution through the volume of an s-dimeusional 
sphere can be obtained by projection from a uniform distribution over the surface 
of an (s + 2)-dimensional sphere, each sphere having radius a. In the case s = 1, 
we see that a distribution uniform over a diameter can be obtained by projection 
from a distribution uniform over the surface of a sphere. This is essentially 
Archimedes’ theorem on the surface area of a sphere. 

Now for the sum of two vectors, each with a distribution uniform over the 
surface of an (s + 2)-dimensional sphere, we can appeal to a special case of 
Kluyver’s original solution of the problem of random flights, or rather to the 
generalisation to any number of ee given by Watson [4]. From his re- 
sults ({4], sees. 13.48 and 13.46 (3)), it follows that the probability function is 


ja r(}s + 1) 
(9) T(r) = )" Tas+)T@ * 


10, > wm. 


(s—1)/2 


—*e*(4a° — r’) 1 2a: 


Substituting in (7) with m = 1, we obtain P,(r) asa multiple of [ (4a° —(°)°-”* dt, 


and then (6) follows. 

If the distribution of each r, and r, is according to (8), with m not necessarily 
equal to 1, then it is the projection from space of dimensions (s + 2m) of a dis- 
tribution uniform over the surface of a sphere. The argument just used is ap- 
plicable and shows that 


= ‘| (407 — PyPrerOr? — pyri de, 


where k is a constant. When m = 1 this reduces to (4), but is otherwise more 
complicated. The result is still true when 2m is not an integer. 
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4. Hammersley proves that for large values of s the distance between two 
points in the sphere is nearly always equal to a+/2, the diagonal of the rectangle 
determined by orthogonal radii. He does this by showing that as s tends to in- 
finity, |r, + f2| is asymptotically distributed in a normal distribution with mean 
av/2 and variance a’/2s. 

From the characteristic function it is seen that Hammersley’s result is a 
corollary of a more general one, namely that the s-dimensional distribution given 
by (8) is asymptotically normal with second moment a’s(s + 2m). Here a 
normal distribution has the probability function 


P(r) = Cyr exp (= hsr’/y2), 


where ye is the second moment and C, a constant, and has the characteristic 
function 


$(p) = exp (— 4yzp’/s). 


The distribution (8) has characteristic function A,j2.4m-:(ap). This can be 
verified by direct calculation, or derived from the facts that a spherical distribu- 
tion and its projections (in the sense of sec. 3) all have the same characteristic 
function (proved in [3]), and that a distribution uniform over the surface of a 
sphere of radius a in s + 2m dimensions obviously has the characteristic function 
A,j24m—1(ap). Now 

22 44 
aired eto We ew ge Eis icici laine als ve 
2(s + 2m) 8(s + 2m)(s + 2m + 2) 


‘ ap. 
is + 2m) } 
as s tends to infinity, uniformly in any p-interval. Thus the distribution (8) is 
asymptotically normal with uw, = a’s(s + 2m)". 
Taking m = 1, we obtain the distribution (4) which is therefore asymptotically 
normal with uw» = a’s(s + 2)~'. The distribution of r,; + r, is thus asymptotically 
normal with 


(10) ue = 2a’s(s + 2). 


Taking m = 0, we see that the distribution uniform over the surface of a sphere 
of radius b is asymptotic to a normal distribution with ~. = b’. Comparing with 
(10), we see that the distribution of r; + fr. is asymptotic to a distribution uni- 
form over the surface of a sphere of radius a(2s)'"(s + 2)” ~ av/2(1 + 87"). 
This is equivalent to Hammersley’s result. 

We could avoid the use of characteristic functions in an increasing number 
of dimensions by projecting onto a diametral subspace of a fixed number of di- 
mensions. Since projection does not alter the characteristic function, the result- 
ing calculation will be the same. 
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EXTREME VALUES IN SAMPLES FROM m-DEPENDENT STATIONARY 
STOCHASTIC PROCESSES 


By G. S. Watson 
University of Melbourne, Australia 


Summary. The limiting distributions for the order statistics of n successive 
observations in a sequence of independent and identically distributed random 


variables are shown to hold also when the sequence is generated by a stationary 


stochastic process of a certain moving average type. 

A sequence of random variables {z;} has been called m-dependent [3] if 
‘i — j| > m implies that x; and x; are independent. If the variables in a strictly 
stationary sequence are m-dependent and have a finite upper bound to their 
range of variation, the largest in a sample of n successive members tends with 
probability one to this upper bound. This is a simple extension of Dodd’s re- 
sults [1] for the case of independence. 

The following theorem shows that when this upper bound is infinite, the 
asymptotic distribution of the largest in such a sample is the same as in the case 
of independence. 

Tueorem. Let {x;} be a sequence of random variables, unbounded above and 
generated by an m-dependent strictly stationary stochastic process with the property 
that 


(1) lim Sa.>0 max Pl(x; > c), (aj > c)| = 0. 


iigm 


Then, if § = n P{x; > c,n(€)], for & fixed, 


lim Pla; S c,(¢);i = 1, °° 


n--@ 


Proor. Using the formula for the probabilities of the joint occurrence of a set 
of events in terms of probabilities of occurrence of their contraries (Feller [2], 
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p. 61), we have, for any even integer 1 S n and for i = 1, ---, n, that 
P(x; & c,(€)| is bounded below and above, respectively, by 
1 —>oP(x; > c) + --- + (-1)' "ZS Pl(ai, > 0), «++, (ta, > O), 
1 —>oP(x; > c) +--+: + (—1)'DPl(axi, > ©), «++, (ai, > OD], 
where, for brevity, c = c,(£). Clearly, )>P(x; > c) = nP(x; > c) = &. 
Now 


> Plz; > ), (2; > o)] 


= > (n — i)P{(ay > c), (inn > €)] + P(x; > 0)’ 3(n — m — 1)(n — m). 


i=} 


But 


Be (n — i)P\(a > c), (igs c)] 


i=] 


( 
< mn E Toe »| max P[(x; > c), (x; > c)] 


2n 


i-jism 


m+ 1 P{(x; > c), (2; > c)] 
mé| 1 — - max : ; 


2n t iism P(2; > Cc) 


Since, asn — «, with € fixed, c = c,(£) — ~, condition (1) shows that the last 
expression tends to zero. Hence 


lim : Pl(ai > c), (aj > c) = $e’. 


The general sum >Pl(zi, >c), «++, (a, > c)] contains (") terms. Of these, 


there are order n terms in which none of the x; appearing ever differs in its 
subscript by more than m from its nearest neighbours, order n° terms in which 
only one 2; differs in its subscript by more than m from its nearest neighbours, 
and so on, provided that n is large enough for all the cases to occur, There are 
~n‘/q! terms in which each 2; is separated in its subscript by more than m from 
its neighbours. These terms may be said to belong to the first, second, --- , 
qth class. The sum of terms of the first class will be less than a constant times 
n MaXx);-jj<m P{(a; > c), (x; > c)|, the sum of terms of the second class will 
be less than a constant times n’P(x; > c)max);—;,)<m P{(x; > c), (x; > c)| and so 
on until we reach the gth class, where the sum is [n"/q! + O(n*")| P(a; > ce). 
Thus, by (1), the only terms contributing to the sum as n — © are those of 
the qth class; these yield £*/q! asymptotically. 
Thus for any even integer | we have shown that 


I—1 


(—¢)° (~)" 
> s lim P(e; < ent) s OY 


’ ’ 
q=0 q- n—-2 y= 0 : 


which proves the theorem. 
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To show that this theorem covers a class of stochastic processes of practical 
interest, it is shown next that the condition (1) of the theorem is true in strictly 
stationary processes which are normal. For this, it suffices to show that 


(2) Pi(xz > ¢), (y > ©] =» 


fi ob 
Plaa>o , par Od 


where x and y have a bivariate normal distribution with means zero, variances 
unity and covariance p, with |p| < 1. Now 


P\(x > ¢), (y > o)] 


| 2 2 
Qnv/1 =f - ex | oq < 2) (x — 2pry + y) | az dy. 


The substitution « = r/c + cand y = t/ce + c leads to 


P\(x > c), (y > o)] 


_ exp [( (—e’/(1 + p)] ig r? — 2ort am r+t om 
— Qre'n/1 — pp? J I ed 2c'(1 — p) — +p = 


on 
wee (2) [ha ee se 
‘ p ry vrs : c large. 


Since P(t > c) ~ (1/-2r) exp (—4c’), statement (2) follows. 


Acknowledgement. ‘The author wishes to record his gratitude to a referee for 


suggesting a change in the assumptions made in an earlier form in this paper. 
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EXPRESSION OF THE k-STATISTICS k, AND kw IN TERMS OF POWER 
SUMS AND SAMPLE MOMENTS 


By M, Zia up-DINn 


Panjab University, Lahore, Pakistan 
The k statistics are of interest to workers in the theory of sampling distribu- 
tions and moment statistics. They are related also to certain aspects of the theory 
of numbers and combinatory analysis, as indicated by Dressel [1]. 
The k statistics were introduced by Fisher in 1928 [2} to estimate the cumulants 
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or Thiele semi-invariants of a population. Dressel [1] has given a table of the k, 
(r = 1, 2,--- , 8) in terms of the sums s, of the rth powers of the observations 
in a sample of size n. This note adds ky and ky to those available in print. 

One may readily obtain ky and ky in terms of the sample moments m, about 
the sample mean by replacing s; by zero and s, (r > 1) by nm, in the following 
expressions. 

The expressions for ky and ky were obtained by following Kendall [3] and using 
tables of the symmetric functions [4]. The work has been carefully checked. A 
fundamental check given by Dressel has been successfully applied to both ex- 
pressions. This check has revealed a correction for Le as given by Dressel: the 
coefficient of (5) (1) in Le should be 


— 15(n* + 2n* — 7n’ + 4n). 


It is found that n“’ ky 
= (n® + 219n? + 3721n* + 6189n* — 7250n4 + 2160n'*)s, 
— O(n? + 219n* + 3721n® + 6189n4 — 7250n* + 2160n*) sos; 
~— 36(n7 + 93n* + 277n® — 1917n* + 2746n* — 1200n?)s78 
+ 72(n* + 156n5 + 1999n* + 2136n* — 2252n? + 480n)s;s{ 
84(n? + 33n® — 83n5 + 543n* — 1214 + 720n?) ses, 
- 504(n*® + 63n5 + 97n* — 687n* + 766n? — 240n) 86828; 
504(n5 + 94n4 + 731n3 + 254n? — 240n)sesi 
126(n? + 9n* + 61n> — Wnt + 370n? — 240n?) 858, 
- 1008(n* + 21n5 — 1ln* + 171n* — 422n? + 240n)s85858; 
756(n® + 18n5 — 113n4 + 198n? — 104n*)s583 
4536(n> + 34n* — 9n? — 106n? + 80n)ss8081 
- 3024(n* + 49n3 + 176n* — 16n)sssi 
+ §30(n* +9n5 + 61n* — 201n* + 370n? 240n ais) 
2520(n* —5n* + 4n?)848582 
- 7560(n5 + 10n* + 15n? — 10n* — 16n)s48581 
11340(n> +6n‘ — 41n* + 66n* — 32n)s,s38; 
- 30240(n* +14n' 19n? + 4n)s48281 
- 15120(n* + 21n? + 20n) 8481 
- 560(n*® 6n> +31n* — 66n? + 40n?)s} 
15120(n’ — 2n* + 7n3 — 22n? + 16n)sis08) 
20160(n* + 4n* + lin? — 16n)sis} 
7560(n§ — 6n* + 1ln® — 6n?) 8,8 
- 90720(n* —n?® in? +- 4n)s,s287 
151200(n3 + 3n? $n) 838281 
60480(n? + 6n)s38} + 22680(n* — 6n* + 11n? — 6n)s38; 
151200(n? — 3n?2 + 2n)s3s; + 272160(n? — n)s3s; 


181440n 828; + 40320 s? 
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Similarly it is found that n“” ky 


n 


* + 466n*® +-15706n7 4 


10(n® + 466n7 + 15706n* 


7 2976n° 


41171n° 


72976n® 


UD-DIN 


—- 41186n* + 45624n? 
41171n4 


4$1186n? 


+ 45624n? — 


- 12096n?) 810 


12096n) 808; 


45(n® + 212n7? + 2428n8 9166n® + 859n* + 27098n3 


- 33528n? + 12096n) 3482 


90(n? + 339n® + 9067n* + 31905n* — 20156n3 — 7044n? + 6048n)s,87 


120(n* + 88n? + 40n® + 526n5 + 2719n4 18758n3 +- 27480n? 12096n ) 878 


+ 150n* + 1234n5 $320n* + 1789n* + 4170n? — 3024n)s7808, 


ji ain ¢ - a naan oe 3 
7 + 213n5 + 3845n* + 7755n? 5526n? +- 432n)s78; 


210(n* + 32n? + &8n*® + 734n° 5441 n* + 17378n3 24888n? + 120967 ) 8684 


1680 (n? 60n® + 64n® + 630n4 13613 690n? +- 1296n) 86838; 


1260 (n'? 57n® 203n5 165n* + 2794n* — 3912n? + 1728n 8682 


7560 (n® 89n® + 365n* 1385n* + 1074n? 144n) 86828; 


5040 (nn 120n4 + 1235n* + 900n? — 576n) 8681 


126(n* + 16n? + 256n° 1274n° + 5959n4 16886n* + 24024n? 


246n? 


1740n? 


120967 ) 8; 


2520 (n! 24n® + 172n® 270n* + 259n* + 432n ) 858481 


5OAO(n? 15n® 1Oln® + 405n4 1196n' 8647 ) 858382 


15120(n* + 33n® + 45n* + 255n' 766n? + 432n) 85858; 


22680 (n® 4+ 20n° 135n4 Lin? + 134n? 144n ) 8,898; 


60180 (n> + 45n4 35n8 225n? 144n) 85828) 


30240 (n4 6On* + 275n?) 858) 


3150? 3n® + 31n° 375n* 4+ 1264n3 1788n? + 864n)s;s 


9450 (n* 17n® + 125n* 305n ® + 594n? $32) 8481 


12000n' 3n* + 25n5 1on* 26n* + 48n?)s4s; 


on" 


75600 (1 15n4 jn 


L4n?) 8483828) 


LOOSOO (n® l5n* + 35n4 15n? SON ) 84858 


LSOOO (n 5d5n* + 2153 306n? 4+ 


l44n 3452 


226800 (n° 55n* + SOn? son 


10n* 


848281 


378000 (n4 LSn® 19n? 


151200 (n5 25n? + 30n 


16800 (n' 3n5 + 25n4 t5n 26n? {Sn )s381 


25 18n) 3383 


7n® + 25n* 


37800 (n® 65n + 94n? 


302400 (n 5n3 30n? 24n 


$3808} 


- > ~ 4 ‘ 24 
252000 (n* 6n3 + 17n? 24) 838) 


3 


302400 (n® 5n* + 5n' 5n? 6n ) 838981 


1512000 (n4 On $3508 


in? 4 


1814400(n* + 4n? 


on 


(7) 848) 


604800 (n? 


22680 (n 10n* + 35n8 
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567000(n* — 6n? + 11n? — 6n)s3s7 
2268000(n? — 3n? + 2n)sost 
- 3175200(n? n) 838} 


+ 1814400n $281 


— 362880 s;° 
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THE PROBABILITY INTEGRAL OF RANGE FOR SAMPLES FROM A 
SYMMETRICAL UNIMODAL POPULATION 


By J. H. CApWELL 


> 1 ' . 
Ordnance Board, Great Britain 


1. Summary. An asymptotic expression is given for the probability integra! 
of range for samples from a symmetrical unimodal population. Its accuracy is 
investigated for the case of a normal parent population and for sample sizes 
from 20 to 100. Over this range errors are small, and by using a correction 
based on values given below the probability integral can be found with a maxi- 
mum error of 0.0001. Percentage points of range in the norma! case are tabled 
forn = 20, 40, 60, 80 and 100. 


2. The asymptotic expansion. The parent probability density function ¢(2) is 
symmetrical about z = 0 and its integral from 0 to z is denoted by (x). The 
p.d.f. of w, the range for a sample of size n, is 


zx 
(1) p(w) nin — » | {[db(x) — (x — w)}" “d(a)bla — w) dz. 
x 


Integrating with respect to w from —« to w gives 


ee 


F(w) = n | \b(x) — (x — w)}” (ax) dz. 
— x 
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Hartley [2] proves that this can be transformed to 


(3) F(2u) {2(u)}" + 2n | {b(r + u) — O(x — u)}" 'o(x + u) de. 


Since ¢(x) is unimodal the integrand in (3) is greatest when x = O and de- 
creases rapidly to zero on either side of this point. This suggests the application 
of the methods used [1] to furnish asymptotic series for similar integrals. We 
have 


ap | u) 


D(a t uu) b A u 20 u)}1 A(u)a + j exp 
24 
I> u 


; 3 ( 4 2 rf 2 2 we ) 
o(x + u) o(u) {1 + Blu)zx’” + .+-}expi.™ u) _& ’ -4 1 xr (nu) 


— _ Pe, 
o(u) 2 \o(u) 2o(u) | 
Ignoring all but the first term in each series, we substitute in (3) to obtain 


m—l(y 


F(2u) {2(u)}" + 2/Qe nko(u){2h(u)}"" {4 — &(—ke’(u)/o(u))} 
“ exp {3k°(o'(u) o(u))’}, 
where 
k* = (g'(u)/o(u))’ — (6”(u)/o(u)) — (n — 1)¢’(u)/P(u). 
When ¢(x) = exp — }2°/~/2r, we find that (4) reduces to 
(5) F(2u) {2b(w)}" + 2nk {2(w)}"" {exp — hu'(1 — k’)} {4 — P(uk)}, 
where 


k* = 1+ (n -— 1)ud(u)/¥(u). 


In this case it is easy to include a further term, which results in the last 
bracket of (5) being replaced by 


(6) {3 — d(uk) — (n — 1)k*P(w)Q(uk)}, 


where 


P(a) 


x ) x — 3x d(x) 


$ \(z) 24 d(x)’ 


Q(x) = (2° + Ga” + 3){4 — O(xz)} — (a + 5z)oG(z). 

3. Accuracy in the normal case. While w is not defined when n = 1, expression 
(2) gives F(w) the formal value unity for all w. This is also the value given by 
(5) or (6). Thus our expression, besides being asymptotically correct, also gives 
the exact value when n 1. Hence [1], errors will at first rise with increase of 
n and then fall asymptotically to zero. 

The following values of maximum error for (5) and (6) are the differences 
between exact values obtained by evaluating the p.d.f. using (1) and values of 
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F(w) then found by quadrature. 


Sample size, » 20 “o 100 
Maximum error of (5): +0.0031 +0.0040 +0.0043 
Maximum error of (6): —0.00052 —0.00070 0.00075 


By using (6), results of reasonable accuracy are obtained. Table I gives correc- 


tions in units in the fourth decimal place to be added to the approximate value 
given by (6), for five sample sizes. The corrections are given as functions of the 
approximate value itself, rather than of w, to make interpolation for n much 
simpler. By plotting the correction against the approximate probability on 


TABLE I 
Corrections (x10*) to be applied to approximate value obtained from equation (6), 
for samples of size n 


Value obtained from equation (6 


TABLE II 


Percentage points of range (w) for samples of various sizes (n) from normal 
populations of unit standard deviation 
Percentage points 
250 500 750 | 950 | .990 
3.69 a if 015.65 
27 D.4 506.09 
59 if 8 5.76)/6.34 
81 5.24/5.67 5.95/6.51 
97 5.3915 5.096 .64 


arithmetical probability paper, we can interpolate graphically for n and the 
approximate value. This will enable the probability integral to be found with 
an error that should not exceed 0.0001, and will usually be less than 0.00005. 

Table II, giving percentage points of range found by quadrature, will assist 
in making preliminary estimates. Plotting (w — ®) against n™“” for a given 
percentage level, permits interpolation for other values of n to be made with 
accuracy. Values of @ are tabled in [3]. 


Acknowledgement. The author is indebted to the Chief Scientist, British 
Ministry of Supply, for permission to publish this note. 
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ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Montreal meeting of the Institute, September 10-13, 1954) 


1. On Quadratic Estimates of Variance Components in Balanced Models, 
A. W. Wortham, Chance Vought Aircraft and Oklahoma A and M College. 


A balanced model is defined as a model whose analysis of variance mean squares are sym- 
metric in the squares of the observations. Included in this class of models are: (1) Com 
pletely Randomized, (2) Randomized Blocks, (3) Latin Squares, (4) Graeco-Latin Squares, 
(5) Split Plots, (6) Factorial Arrangements, ete. 


” 


The ‘analysis of variance estimates”’ of the variance components are the estimates ob- 
tained by solving the system of equations which result when the observed and expected 
mean squares in the analysis of variance table are equated. For any infinite population 
let the general balanced model be Yijie’''in @ HH zh 1 Arig + ¢ s69°° "da » where u is a con- 
stant, Aggy and eé,;,°++;, are independent random variables with zero means, finite fourth 
moments, and variances o; and o respectively. Let 6% and 66 be “‘the analysis of variance 
estimates’’ of the variance components oi and of . It is shown that the quadratic estimate 
of Di Jiok (gx known) which is unbiased, independent of u, and has minimum variance is 
given by Sky guok . That is, the best quadratic unbiased estimate of the linear combina- 
tion of the variance components is given by the same linear combination of ‘tthe analysis 


’ 


of variance estimates’’ of the variance components. 
2. The Coefficients in the Best Linear Estimate of the Mean in Symmetric 
Populations, A. I). Sarhan, University of North Carolina. 


In a previous paper (‘‘Estimation of the Mean and Standard Deviation by Order Sta 
tistics’? by A. E. Sarhan, Ann. Math. Stat. Vol. 25 (1954), pp. 317-328) the best linear esti- 
mate of the mean of a rectangular, triangular and double exponential population were 
worked out. By considering some other symmetric distributions with different shapes, it is 
found that the coefficients in the estimates form a sequence. From the sequence, it is ob- 
served that the coefficients in the estimates are influenced by the shape of the distribution. 
The variances of the estimates are also so affected. 


3. Distribution of Linear Contrasts of Order Statistics, Jacques St. Pierre, 
University of North Carolina. 


Consider n + 1 independent normal populations with unknown means, mo , 7m ,*** , my , 
respectively, and with a common known variance o? = 1 (say). Suppose a sample of size N 
is available from each population; and let z@) > 2a) > +++ > 2m) be the ordered sample 
means. Consider the linear contrasts z = 2) — ¢:%a) — *** — Cain) , Where SP = ] 


ai=iCi = ’ 


ec; 2 0, (¢ = 1, 2, +++ , nm). The probability density function of the contrasts z is derived 
under the null hypothesis 1): mp = m; = +++ = m, . The density of the contrasts z is also 
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obtained in the case of three populations, under the hypothesis — ~ < m2. S mi S mp < +”. 


Particular hypotheses are considered and tables are given. Finally, the particular contrast 
Y¥Y = Zo) — Za) is considered in the general case. 


4. Note on Fourier Periodogram Analyses of Time Series, B. F. Kimball, 
New York State Public Service Commission. 


R. A. Fisher’s treatment of the probability distribution of the squares of the amplitudes 
of the Fourier harmonics R’, is followed. One deals with a time series y; of N observations. 
The null hypothesis is taken as the hypothesis that E(y;) = 0 and that the y; are indepen- 
dently and normally distributed with constant variance. Let the index n of Rs denote the 
index of the fitted harmonic such that N/n denotes the period of this harmonic. If N/n is 
an integer one can replace the series of N terms by one of N/n means Sy; /n where the y; 
are of the same phase in period N/n. The harmonics of this series are selected harmonics 
of the original series. This paper examines the implications of such a breakdown for the 
testing of the significance of the short period harmonics relative to the null hypothesis 


5. Univariate Two-Population Distribution-Free Discrimination, David 8. 
Stoller, RAND Corporation. 


A univariate random variable, z, is defined by the composite cumulative distribution 
function, F(z) = 0F;(z) + (1 — 6)F2(z);0 S 6 S 1. Restrict F; and F; to be such that the 
optimum a priori discriminating regions are S; = |z|zsS ¢} and S; = | z|2z> ¢|, where 
¢ is unique. Optimum discrimination is defined as that which maximizes the probability 
of correctly classifying z. Denote the above maximum probability by Q(¢). Given an inde- 
pendent random sample of size N from F(z), each member of which is classifiable, a dis 
tribution-free estimate of ¢, denoted by ¢*, is constructed as follows. Let t(z) = k(z) — 
h(z), where k(z) is the number of observations from the first population (i.e., that defined 
by F,) which are less than or equal to z, and h(z) is similarly defined for the second popu 
lation. Then ¢* is any value of z that maximizes t(z). The estimate, ¢*, possesses two asymp- 
totically optimum properties: (1) the probability of correct classification induced by using 
¢* instead of ¢ converges in probability to Q(¢), and (2) the quantity, ¢(¢*), may be used to 
construct an estimate of Q(¢) which converges in probability to Q(¢). 


6. New Types of Easily Constructed Partially Balanced Incomplete Block 
Designs, John Mandel and Marvin Zelen, National Bureau of Standards. 


In the planning of experiments in the physical sciences one is often confronted with natural 
limitations on the size of experimental blocks. Therefore, the use of incomplete blocks is 
becoming ever more widespread in this type of application. In this paper a type of partially 
balanced incomplete block design is introduced, the construction of which consists in re- 
placing each treatment of a balanced design with a group of treatments which themselves 
form a balanced design. A large class of designs thus becomes at once available by com 
bining Latin Squares with Youden Squares, or Youden Squares with Youden Squares. An 
important property of these designs is the possibility of two-way elimination of error (ac- 
cording to rows and columns). A general formula is given for the Least-squares estimation 
of corrected treatment effects. Because of the flexibility of the proposed designs, their ease 
of construction, and simplicity of analysis, they are well adapted to experiments in physical 
and chemical laboratories. Investigations are in progress to extend the results to designs 
formed from combinations of chain block and other partially balanced incomplete block 
designs. 
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7. The Stochastic Convergence of a Function of Sample Successive Differ- 
ences, Lionel Weiss, University of Virginia. 


Let f(z) be a bounded density function with at most a finite number of discontinuities, 
and such that there are two finite numbers, A and B (A < B), with f(z) nondecreasing in 
the interval (— «, A) and nonincreasing in the interval (B, «). Let X:, X:,°-- , Xn be 
independent chance variables each with the density f(z). Define Yi S Y2 S +--+ S Y, as 
the ordered values of X; , X2,°°* , Xn; 7; as Yiu. — Yi ; and R,(t) as the proportion of 
the values 7; ,--- , 7’,-: not greater than t/(n — 1). S(t) denotes {1 -f-. f(r)e"¥@ dz]; 
and V(n) denotes sup,>, | R(t) — S(t) | . Then it is shown that V(n) converges stochas- 
tically to zero as n increases. This result can be used to demonstrate the stochastic conver- 
gence of various functions of 7 ,--+ , T’,-1: , including some which have been discussed 
in the literature. 


8. On a Modified T? Problem, Ingram Olkin, Michigan State College and S. 
8. Shrikhande, College of Science, Nagpur. 


Consider two independent random vectors X = (X,,°-: ,X,), Y = (¥1,°+: , ¥Y»), 
each obeying a p-variate normal probability law with EX = (0,, +++ , O04, west, *** »Mp)y 
EY = (0, ,°** , Ge, vey. ,°** , vp), and same covariance matrix >, with all the parameters 
unknown. On the basis of a sample of n and m observations from X and Y, respectively, 
the hypothesis Hy : yu; = »; against Hi: yi # 1 (i = k + 1,°+-: , p) is to be tested. The 
problem is equivalent to the case where X and Y are random vectors with means EX = 
(O, +++ ,O, desi, *** » bp), BY = (0, +++ , 0), and same covariance matrix 5. On the basis 
of one and n observations from X and Y, respectively, Hy: @ = 0 against Hi: ¢@ # 0 
(i = k-+41,°++ , p) is to be tested. The likelihood ratio statistic is obtained and its dis- 
tribution under Ho and H, derived. If k = 0, the statistic reduces to Hotelling’s 7° statistic. 


9. The Validity of Sheppard’s Corrections for Grouping, F. J. Anscombe, 


University of Cambridge and Princeton University. 


The moments of an absolutely continuous one-dimensional distribution are to be com- 
pared with the moments of the same distribution when it has been ‘“‘grouped”’ with constant 
grouping interval. The characteristic function @*(t) of the grouped distribution may be 
expressed as a Fourier series in terms of the characteristic function @(t) of the original 
distribution. The expansion is similar to those for moments given by R. A. Fisher (Philos. 
Trans. Roy. Soc. London, Ser. A., Vol. 222 (1922), pp. 309-368, section 5), but requires no 
condition on the original distribution other than absolute continuity of the distribution 
function F(x). Sheppard’s formulas are obtained when the periodic terms in the series are 
neglected. The periodic terms are small if | @(¢) | is small for large ¢, and this condition is 
related to the differentiability of F(x) for all values of z. The emphasis that has often been 
placed on the differentiability of F(z) at infinity or at the ends of a finite range is mislead- 
ing, because these points are not specially important. 


10. Unbiased Tests Based on Unbiased Estimators, Reed B. Dawson, Depart- 
ment of Defense. 


A test of a point-hypothesis @ = 69 of a distribution parameter will be said to be strongly 
unbiased when the power depends on @ alone and exceeds the size against all alternatives. 
For any a, 0 < a < 1, there exists a strongly unbiased test of size a if and only if there 
exists a real-valued function {(@) which is zero at @ , strictly positive elsewhere, and pos- 





ABSTRACTS 809 


sesses a bounded unbiased estimator. For, if w(z) is the rejection probability corresponding 
to an outcome z, let f = Ew — a; iff isgiven, takew = a + Kj, where f(z) is the estimator 
and K is a suitable positive constant. One application concerns a sample of n items from 
the family of all distributions over the unit interval. The possible strongly unbiased tests 
of a point-hypothesis on the rth moment form a bounded convex body in #, over which 
the power is a linear functional. A second application (Mosteller’s suggestion) concerns 
the hypothesis of independence of two attributes in a 2 x 2 table where sampling proceeds 
until a chosen cell attains a fixed quota. Powers of the determinant of the underlying prob- 
abilities admit bounded unbiased estimation, giving unbiased tests without the Neyman 
structure. 


11. The Mean Square Error of the Sample Median, Harold Hotelling, Univer- 
sity of North Carolina. 


For random samples of any odd number from an arbitrary population, the ratio of the 
mean square error in the sample median, regarded as an estimate of the population median, 
to the corresponding population parameter, is shown never to be less than unity. This 
lower bound is actually attained for the familiar two-point distribution with equal proba- 
bilities. The fact that in this case the accuracy, however measured, of the median of a 
large number of observations is no better than that of one random observation destroys 
the argument sometimes given that the median should be used in the absence of knowledge 
of the form of the underlying distribution. (Research sponsored by the Office of Naval 
Research at Chapel Hill, North Carolina). 


12. The Moments of the Sample Median, J. T. Chu and Harold Hotelling, 
University of North Carolina. 


Moments of medians of random samples are studied by a method involving expansion 
about 4 of the inverse of the cumulative distribution function, and in other ways. Readily 
calculable approximations are found, both for large and for small samples, with close upper 
and lower bounds on the errors of approximation. The asymptotic behvaior for large sam- 
ples is examined. Calculations are carried out for the Laplace, Cauchy and normal distribu- 
tions. (Research sponsored by the Office of Naval Research at Chapel Hill, North Carolina). 


13. Distribution of the Largest Vote in Unstructured Random Balloting, Leo 
Katz, Michigan State College 


The exact distributions of the maximum vote are obtained for two balloting arrange- 
ments. In both, each person votes once at random without prior reduction of the field of 
choice by a nomination process. In the first arrangement, a person may, if he (randomly) 
wishes, vote for himself; in the second, voting is gentlemanly. The second case has direct 
application to determination of ‘‘stars’’ in sociometric testing. An approximation is given; 
it is shown to be reasonably accurate for moderate-sized groups. 


14. Statistical Programming, D. F. Votaw, Jr., Yale University. 


Statistical programming problems arise when some of the constants in a programming 
problem are unknown but statistical information about them is available. In this paper 
several methods of statistical programming are compared in connection with a special 
linear programming problem. The application of simultaneous confidence interval estima- 
tion is discussed. (Work sponsored by the Office of Naval Research.) 
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15. Exact Tests of Significance for Combining Inter- and Intra-Block Informa- 
tion in Incomplete Block Designs (Preliminary Report), Marvin Zelen, 
National Bureau of Standards. 


Consider an incomplete block design where the number of blocks is greater than the 
number of treatments (b > v). It is then shown under the usual assumptions for the re- 
covery of inter-block information that two independent F tests of the null hypothesis 
(all treatments are the same) exist; one using only inter-block information and the other 
using the intra-block information. Let F,(i = 1, 2) represent the F ratio obtained for each 
test; 1 + rd/oi represent the expected value of the numerator to the denominator of re 
spective F ratios, where \ = Z(t; t)?/v — 1 is a measure of the departure from the null 
hypothesis (i.e.,\ = 0 if MH, is true); also let py; = P{F 2 F; | Ho}. Then a combined test 
which seems to adjust for the differences in power of the two independent tests is given 
by the region | pips = Q}, where Q is chosen such that Pi p,p$ $Q@j=#@Q- @''/l—dO=2a 
(level of significance), and @ = rj/rioz . For example, 6 = 1 I. /Elo?/o? + kos! for bal 
anced incomplete block designs where o? is the ‘within block” variance, o, the “‘between 
blocks”’ variance component, F is the efficiency factor and k is the plot size. Approxima- 
tions to the power function of the test have been derived and preliminary calculations 
indicate that the above critical region seems to have greater power as compared to weight- 
ing the individual p,’s equally as in Fisher’s method 


16. Moments and Related Quantities of the Null Distribution of Linear Con- 
trasts of Order Statistics in the Case of Three Populations, Jacques St. 
Pierre, University of North Carolina. 


Consider three independent normal populations with unknown means, my , m, , me re 
spectively, and with a common known variance o? = 1 say. Suppose a sample of size N is 
available from each population. Let zw) > 2a) > 22) be the ordered sample means. Con- 
sider the linear contrasts z = zw) — cXq) a c)2a), where 0 S ¢ S 1. An expression 
for the kth moment about the origin is obtained. Properties of the moments and related 
quantities (skewness and kurtosis coefficients) are established, considering these quanti- 
ties as functions of the nonstochastic parameter c. Tables of moments of low order are 
given in cases of special interest 


17. Application of Faa di Bruno’s Formula in Mathematical Statistics, Eugene 
Lukaes, Office of Naval Research. 


Let z = G(y) and y = f(z) be two functions such that all the derivatives of G(y) and f(z) 
up to order p exist. We denote by Dit | the operation of determining the kth derivative 
of the function in the braces with respect to t and we write fy = Di{f(t)}/vl, f = fo = f 
then D?z = D?\Gif(t)|} = VpiDhiG(y) fii e+ fal /(ii! +++ i, !) where the summation is to 
be extended over all partitions of p such that i; + a2 + +++ +7, = k and i:ki + ioke + +++ + 
i,k, = p. This formula is due to F. Fad di Bruno [Sullo Sviluppo delli Funzioni, Annali 
di science mathematiche e fisiche 6 (1855), pp. 479-480,]. Fad di Bruno’s formula can be ap- 
plied in mathematical statistics. The relations between the cumulants and the moments 
of a distribution are derived easily by means of this formula. It is also useful in the study 
of R. A. Fisher’s k-statisties. For instance, the explicit formula, expressing the k-statistic 
of order p in terms of the observations, can be obtained. In addition to these familiar re- 
sults, the following theorem is proven. Let 2 , 22, +++ , 2, be n independent observations 
taken from a population with distribution function F(z) and denote by p an integer greater 
than one. Assume that the pth moment of F(z) exists. The population is normal if, and only 
if, the k-statistic of order p is independent of the sample mean 
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18. On Simultaneous Minimax Point Estimation, Waldo A. Vezeau and Koichi 
Ito, St. Louis University. 


This paper is concerned with simultaneous minimax point estimation of all the param- 
eters in the multivariate distribution function of s parent population on the basis of a 
sample of fixed size. Extending results due to K. Miyasawa (Bull. Math. Stat., Vol. 5(1953), 
pp. 1-17), it is shown that if the risk is a bounded function of s parameters, 6: ,*-- , 4, 
and their point estimates, d, ,--- , d, , and a convex, measurable function of d; ,--: , d, 
for any fixed 6 ,--+ , 6,, and if the space D of d, , --+ , d, is compact and convex, then 
there exists a set of simultaneous minimax point estimates of 6, , -** , 0,in D. Applications 
of this theorem are made to simultaneous minimax point estimation of the parameters in a 
multinomial distribution, the mean and variance (or standard deviation) of a univariate 


normal distribution, and the means, variances and covariances of a multivariate normal 
distribution. 


19. Estimation of Structural Parameters when the Number of Incidental Pa- 
rameters is Unbounded, J. Wolfowitz, Cornell University. 


Let [T7-:1 IT Sea 6, a;) be the frequency function of the observed chance variables 
(Zah, 4 wi, -e*, npJ el, , m; , which depends upon the unknown (structural) par 
ameter @ and the ear ‘ineidental) parameters {a;}. The author proves that in general 
there exists no estimator of @ which is efficient for all sequences {a;}. This verifies a con 
jecture of the author’s, described in the Proc. Roy. Dutch Acad. Sci., Ser. A, Vol. 56, No. 2, 
and Indag. Math., Vol. 15, 1953, where a heuristic supporting argument was given. 


20. On Power Properties of Certain Simultaneous Tests, K. V. Ramachandran, 
University of North Carolina. 

(1) Let y: , ye, *** , yx be k independent normal variates with E(y;) = uw; and v(y,) = 
o7(i = 1,2,+-*- , K). uw; and eo are unknown but an independent estimate s* of o? with v d.f. 
is available. To test the hypothesis: uw: = yw: = -** = wx we have a short cut test of Tukey 
based on the studentized range. (2) Let y: , yz, *** » yx be k inde pendent norms al vari: ates 
with variances oi , o2 yor? . om respectively. To test the hypothesis: on=0o= = ox 
we have the Fax ratio test of Hartley. In this paper the following properties of the tests 
are proved. The power function of the tests depend only on k — 1 parameters, namely, 
by. = wi — wi (i = 2,3,--* , K) in case (1) and ni. = oi/oi (i = 2,3,-°: , K) in case (2). 
The tests are completely unbiased but the power functions do not have the monotonicity 
property. A set of useful lower bounds are obtained for the power in the two situations. 


Power properties of multivariate and other generalizations of these tests are being inves- 
tigated. 


1. On Tests of Normality and Other Tests of Goodness of Fit Based on Dis- 
tance Methods, M. Kac, J. Kiefer, and J. Wolfowitz, Cornell University. 


The authors study the problem of testing whether the common distribution function 
(d.f.) of the observed independent chance variables 2; , ++: , 2, is a member of a given 
class. A classical problem is concerned with the case where this class is the class of all 
normal d.f.’s, and for the sake of brevity the description in this abstract will be limited 
to some of the results for this problem. For any two d.f.’s F(y) and G(y) let 6(F, G) = 
sup, | F(y) — Gly) |. Let N(y lu,o @°) be the normal d.f. with mean p» and variance o?, 
Define # = n“2Zfx,;, 8? = nT xi — #. Let Gly) be the empiric d.f. of 2, --- 
The authors consider, inter alia, tests of normality based on vo, = 8(Gi(y), N(y | #, 8*)) 


and On Wa = f(Gily) — N(y | £, 8*))? dyN(y | 2, s?). It is shown that the asymptotic power 


Tn. 


’ 
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of these tests is considerably greater than that of the optimum x? test. The covariance 
function of a certain Gaussian process Z(t), < ¢ S 1, is found. It is shown that the sample 
functions of Z(t) are continuous with probability one, and that, asn— «,lim P{nw, <a} = 
P|\W < a}, whereW = f. [Z(t)}* dt. Tables of the distribution of W and of the limiting dis- 
tribution of ~/n v, are given. The role of various metrics is discussed. 


22. Tolerance Regions (Preliminary Report), D.A.S. Fraser and I. Guttman, 
University of Toronto. 


Three definitions are considered for tolerance regions. A ‘‘distribution-free tclerance 
region’ has the distribution of its probability content independent of the parameter. 
A “‘B-content tolerance region’”’ has probability content at least 8 with an assigned level 
of confidence. A ‘‘8-expectation tolerance region’’ has probability content on the average 
equal to 8. For the first definition a necessary and sufficient condition has been obtained 
for the characteristic function of the region. For sampling from univariate distributions 
for which the order statistics are complete, the nonexistence of distribution-free tolerance 
regions was obtained in the discontinuous case and some results on distribution-free toler- 
ance bounds were obtained in the continuous case. For the third definition an analogy with 
hypothesis testing has been established by introducing a density function to indicate the 
desirability that different points of a distribution be included in the region. For normal 
distributions the center of the distribution was weighted more heavily than the tails and 
most stringent tolerance regions obtained. For univariate distributions they were [X + do] 
and [X¥ + ds,] depending on whether or not was known. In the multivariate case they are 
based on Hotelling’s 7? statistic. 


23. Comparison of the Power of Nonparametric Two Sample Tests against 
Normal Alternatives, Benjamin Epstein, Wayne University. 


This is a sampling study in which we compare the power of run, rank sum, exceedance, 
and truncated maximum deviation two sample tests. The particular case studied involves 
normal alternatives whose distance apart is measured by the difference in population means. 
Two hundred random samples of size 10 are drawn from each population. These results 
are related to recent work of Dixon and Teichroew [Abstract, Ann. Math. Stat. Vol. 25, 
(1954), p. 175]. There are, however, these differences: (i) in the present study we assume 
that the two samples have been placed (simultaneously) on life test, thus making the times 
to failure available in an ordered way and (ii) include exceedance and truncated maximum 
deviation rules among the nonparametric tests. Such rules are particularly useful in life 
test situations. Experimental sampling assigns the following order to the power (best to 
worse): rank sum, untruncated maximum deviation, truncated maximum deviation, ex- 
ceedance, and run. The first four power curves are fairly close together and are all substan- 
tially better than the power curve for the run test. Also included in the paper is experi- 
mental information on the expected number of items failed in reaching a decision when an 
exceedance or truncated maximum deviation rule is used. Substantial savings in this direc- 
tion are possible. (Research sponsored by the Office of Ordnance Research, U. 8. Army) 


24. On the Distribution of Radial Errors Having Normally Distributed Com- 
ponents, A. C. Cohen, Jr., University of Georgia. 


For a set of p independent random variables z; (j = 1, 2,-+* , p), each of which is nor- 
mal (0, ¢), the radial error defined as r = (xt + a feoivee + iy” is considered. It is well 
known that the distribution of ris given by [2r/o*]f,(r*/o*) where f,(x*) is the x? frequency 
function with p degrees of freedom. This paper is concerned with the problem of estimat- 
ing the scale parameter o from unrestricted (complete), truncated, and censored samples 


r 
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of r. Maximum likelihood estimators are developed for each of these cases, and asymptotic 
estimate variances are given. In the case of unrestricted samples, (pn@?/o?) has a x? dis- 
tribution with pn degrees of freedom, where n is the number of sample observations and 
¢* is the maximum likelihood estimate. Tables and graphs of functions necessary for solv- 
ing the maximum likelihood estimating equations for truncated samples are given for 
p = 2and p = 3. Illustrative examples relating to target analysis studies are included. 


25. Confidence Bounds on Departures from a Particular Kind of Multi-Col- 
linearity of Means, 8S. N. Roy, University of North Carolina. 


For k (p + q)-variate N(é; , 2), where Z((p + q) X (p + q)) is symmetric p.d. with sub- 
matrices Lii(p X p), Las(q x gq) and Yis(p X q), and &;((p + q) x 1) has column subvec 
tors £::(p x 1) and &(q x 1), we can set, in the following way, confidence bounds 
on £; — LYu2ntx« which are departures from the hypothesis fi — Lulnti = 
0 (i = 1,2,°-: ,k). Let Su , See and Si. stand for the submatrices of the ‘‘within’’ covari 
ance matrix pooled from k samples of size n each and zi;(p 1) and zai(q X 1) (¢ = 1, +++ , k) 
for the subvectors of the k sample mean vectors. Then setting Si: = Siu — SuSn Sis, 
and B(p x k) and B(p x k) for the matrices with respective column vectors 21; — Si2Sia 2a 
and i; — Liat (i = 1,-++ , k), we have, with a confidence coefficient, say 1 — a, the 
following set of simultaneous confidence bounds (for all arbitrary nonnull a’(1 x p) and 
unit-length b(k x 1)): a’Bb — [k(a’Sica)ca(p, k, nk — k)}'? Ss a’Bb S a’Bb + 
[k(a’S; 2a)ca(p, k, nk — k)}'/2, where ca(p, k, nk — k) is the upper a-point of the distribu- 
tion of the (central) largest determinantal root based on p, k and nk — k D. F. Test for the 
associated hypothesis is also easily obtained. 


26. The Efficiency of Tests, Wassily Hoeffding and Joan R. Rosenblatt, Uni- 
versity of North Carolina. 


The efficiency of a family of tests is defined. Let |X,} be a sequence of random variables 
such that for every n the vector (X; , «++ , X,) has edf G, in some class C, . Let Ci, , Con 
be disjoint subsets of C, such that we prefer one or the other of two alternatives A; , A» 
according as G, eC;, (i = 1,2). Given a: , az , we say thatthe problem ({@i,}, {Con}, ai , a) 
is solved by a test (general nonsequential two-decision rule) ¢, such that 
P(o, selects A; |G,) 2 1 — a; for all G, ¢ Ci, (i = 1, 2). The index of efficiency of a family 
of tests 3 for the problem ({@in}, {Can}, a1 , a2) is N(3) = NCS, [Cin}, (Can}, ai, a2), the 
least sample size with which the problem can be solved by a member of the family 3. If 
3, , 3: are two families of tests, the efficiency of 3, relative to 3, is given by eff (32/3;) = 
N(3:)/N (32). The determination of N(3) is closely related to finding a test which maxi- 
mizes the minimum power. Let 9(G,) be a real-valued function of G, and suppose C,, = 
\Gn : O0(Gn) S 0:1}, Con = [Gn : O(G,) S 02}, 01 < 2 . Under suitable assumptions, we derive 
asymptotic expressions for N(3) as 6 = 6: — 6; tends to zero while a , a; remain fixed. 


27. On a Decision Procedure to Select the Population with the Largest Mean 
(Preliminary Report), R. C. Bose and Jacques St. Pierre, University of 
North Carolina. 


Consider n + 1 independent normal populations with unknown means mo 2 
m, 2 m2*** 2 m, , respectively, and with known or unknown common variance o*, Suppose 
a sample of size N is available from each population, and a decision procedure is required 
to select the population with the largest mean, with the following properties. (a) Either 
a decision is made that the population from which the ith sample was drawn has the larg- 
est mean, or no decision is made. (b) The probability of making a wrong decision (if a 
decision is made) is less than a pre-assigned number ap (independent of the unknown means 
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mo ,m,,*** ,m»,. Subject to the requirements (a) and (b), the decision rule must control 
the chance of indecision. The case of three populations with known o? is considered in de- 
tail, and the properties of a decision rule based on the auxiliary statistic y = x) — 2) 
are studied, where zw) 2 21) 2 22) are the ordered sample means, the rule being to decide 
that 2) comes from the population with the largest mean if y > k, and not to take a de- 
cision if y S k. The general case when n > 2 and o? is unknown is under consideration. 


28. Most Economical Multiple-Decision Rules, William Jackson Hall, Uni- 
versity of North Carolina. 


Suppose z has an unknown distribution function F, belonging to one of m disjoint classes 
@1,°*** , Ww», , and suppose A; , --: , A,, are corresponding alternative decisions. A decision 
rule Dy , based on a sample of size N, is said to be a ‘‘most economical multiple-decision 
rule (M.E. d.r.) relative to (a; , «** , am),0 S a; < 1, for choosing among A: ,+** , Am’’ 
if it satisfies (1) Pr(D, chooses A; | F) = a; for all F ¢€w;(i = 1, +++ , m) and if N is the 
least integer n for which (1) can be satisfied. It is proved that to obtain M.E. d.r.’s one 
need only consider d.r.’s in the sequence {D°,}, n = 0, 1,2, --- , where D®, denotes a mini- 
max solution w.r.t. a certain weight function for samples of fixed size n. If w; contains but 
one distribution F; (i = 1, +++ , m), D®, is of the form: (2) choose A; if aL; = a;L; (j = 
1,--+ ,m) where L,,°*+ , Lm are the likelihood functions of the sample corresponding to 
F,, +++, Fm and a ,*** , dm are positive constants. In the general case, D°, is of a similar 
form where now F, , --: , fF, are ‘“‘average’’ distribution functions, averaged w.r.t. least 
favorable conditional distributions over w: , +++ , wm (if existent). Similar results are 
obtained for ‘‘M.E. d.r.’s relative to (8;;),0 < 8i; S$ 1,’’ defined as above with (1) replaced 
by (1’) Pr(D, chooses A; | F) S 8;; for all F ew;(i # j;7 = 1,°++ , m); and (2) is replaced 
by (2’) choose A; if Lepsibple S Dew jbele (Gj = 1, +++ , m), for some positive constants 
b, , +++ ,b» . Other properties of the d.r.’s are derived and various extensions and examples 
given. 


NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Archie Blake is now employed as an Advisory Engineer in the Systems Anal- 
ysis of Westinghouse Electric Corporation, Baltimore, Maryland. 

E. L. Cox has left Operations Research Group, Case Institute of Technology, 
to take a position with Chemical Corps Biological Laboratories, Frederick, 
Maryland. 

Harold Davis has transferred from Headquarters, United States Air Force 
to The Operations Analysis Office, Hq. Far East Air Forces. 

Professor Hilda Geiringer is on leave of absence from Wheaton College in 
order to complete and prepare for publication on behalf of Harvard University 
some of the post-humous work of Richard von Mises. 

Dr. 8. G. Ghurye has accepted the position of Reader in Statistics, Depart- 
ment of Mathematics and Statistics, University of Lucknow, Lucknow, U.P., 
India. 
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The University of London has conferred the D.Sc. degree in Mathematical 
statistics on H. O. Hartley for his published contributions in the field of mathe- 
matical statistics, and he has accepted a position as professor on the permanent 
staff of the Department of Statistics and Statistical Laboratory, Iowa State 
College. Dr. Hartley has been visiting professor of statistics at Iowa State Col- 
lege since July 1, 1954. 

Stanley L. Isaacson has returned to his position as Assistant Professor of 
Statistics at lowa State College after spending a year on leave of absence as 
Visiting Assistant Professor of Statistics at Stanford University. 

Dr. Hans Kellerer was nominated as fulship Professor for Statistics and as 
Direktor of the Statistical Seminar at the Wirtschafts- und Sozialwissenschaft- 
liche Fakult&ét of the Freie Universitat in Western-Berlin Ist of April 1953. 

Dr. Nathan Keyfitz has resumed his duties as Senior Research Statistician of 
the Dominion Bureau of Statistics after a year’s assignment with the Govern- 
ment of Indonesia. 

Professor T. C. Koopmans will be on leave of absence from the University of 
Chicago and the Cowles Commission during the academic year 1954-55. He 
will spend this year at Yale University in teaching and research. 

Dr. Richard F. Link has left the Analytical Research Group at the Forrestal 
Research Center, Princeton University to accept a position with the Sandia 
Corporation of Albuquerque, New Mexico. 

Lt. Carl R. Ohman, formerly a graduate student at Princeton University, is 
now on active duty with the U. 8. Army, stationed in Washington, D. C. 

Donald B. Owen has resigned as Assistant Professor of Mathematics at Purdue 
University to accept a position as a Staff Member with the Sandia Corporation 
in Albuquerque, New Mexico. 

K. C.S. Pillai, who was a Research Associate in the Department of Statistics, 
University of North Carolina, joined the Statistical Office of United Nations, 
New York in February, 1954. He has completed his work for Ph.D. degree in 
Statistics in the University of North Carolina. 

John W. Richardson is employed as a Physicist at the Ramo-Wooldridge 
Corporation, Los Angeles, California. 

Daniel E. Sands has accepted a position as Biometrician in the Statistics 
Section of the Squibb Institute for Medical Research, New Brunswick, New 
Jersey. 

Major Oliver A. Shaw is now stationed at Hq. Air Research and Development 
Command where he is serving as Research Administrator in Mathematics and 
Mathematical Statistics. 

George W. Snedecor has been elected an Honorary Fellow of the Royal Sta- 
tistical Society “in consideration of the eminent services rendered to statistics.” 

James H. Straughan of Michigan State College has accepted a position as 
Assistant Professor, Department of Psychology, Montana State University, 
Missoula, Montana. 
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Dr. Joseph V. Talacko, Assistant Professor of Mathematics, is on leave of 
absence from the Marquette University, Milwaukee for the academic year 
1954-55. He has a Ford Foundation Fellowship and plans to spend most of the 
year at the Statistical Laboratory, University of California in Berkeley and to 
Visit several west coast universities. 

Dr. Fred H. Tingey, formerly with the General Electric Company, Richland, 
Washington, has joined the staff of the Atomic Energy Division of Phillips 
Petroleum Company at Idaho Falls, Idaho where he will be responsible for all 
statistical studies arising from plant operations. 

M. C. Throdahl, formerly Development Manager for Rubber Chemicals, 
Monsanto Chemical Company at Nitro, West Virginia, has been given a new 
assignment as Assistant Director of the company’s Development Department 
in St. Louis. 

Elizabeth Vaughan has transferred from the U.S. Fish and Wildlife Service 
to the Quality Evaluation Laboratory, Naval Ammunition Depot, Bangor, 
Washington, as an Analytical Statistician and head of the section on special 
investigations. ' 

David L. Wallace, formerly at Massachusetts Institute of Technology, has 
been appointed an Assistant Professor of Statistics at the University of Chicago. 

Samuel Weiss, formerly Chief Statistician and Chief of the Office of Statistical 
Standards, Bureau of Labor Statistics, has recently established a private statis- 
tical consulting office in Washington, D. C. He will, however, continue to act 
as a consultant to the Commissioner of Labor Statistics. 


A. W. Wortham has completed the requirements for a Ph.D. degree at Okla- 
homa A. and M. College and has returned to his position with Chance Vought 
Aircraft, Dallas, Texas. 

R. I. Piper of the Pacific Telephone and Telegraph Company, San Francisco, 
California, died on October 20, 1953 at the age of fifty-one years. He was a 
member of the Institute for ten years. 


Fellowships in Statistics, University of Chicago 


Members of the Institute of Mathematical Statistics are invited to nominate 
research workers whom they feel could benefit from three $4000 post-doctoral 
fellowships in Statistics offered for 1955-56 by the University of Chicago. The 
purpose of these fellowships, which are open to holders of the doctor’s degree or 
its equivalent in research accomplishment, is to acquaint established research 
workers in the biological, physical, and social sciences with the role of modern 
statistical analysis in the planning of experiments and other investigative pro- 
grams, and in the analysis of empirical data. The development of the field of 
Statistics has been so rapid that most current research falls far short of attainable 
standards, and these fellowships (which represent the fifth year of a five-year 
program supported by The Rockefeller Foundation) are intended to help reduce 
this lag by giving statistical training to scientists whose primary interests are in 
substantive fields rather than in Statistics itself. Nominations, which should be 


a 
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made soon since the closing date for applications is February 15, 1955, may be 
addressed to any member of the University of Chicago Statistics Department, 
or to its Chairman, W. Allen Wallis. 


a 


First National Meeting, Society for Industrial and Applied Mathematics 


The Society for Industrial and Applied Mathematics will hold its first national 
meeting in conjunction with the annual meetings of the American Mathematical 
Society, the Mathematical Association of America, and the Association for Sym- 
bolic Logic at the University of Pittsburgh, on December 27-29. The following 
addresses will be presented to an evening meeting: ‘The History of a Problem”’ 
Dr. Brockway McMillan, Bell Telephone Laboratories; ‘“The Control of Indus- 
trial Operations’’, Professor Herbert A. Simon, Carnegie Institute of Technology; 
‘Probability Theory in Liability and Property Insurance’, Mr. C. W. Crouse, 
Actuary, Preslan and Company. Further information can be obtained from H. 
W. Kuhn, Dalton Hall, Bryn Mawr College, Bryn Mawr, Pennsylvania. 


Cn 


New Members 
The following persons have been elected to membership in the Institute 
May 14, 1954 to August 9, 1954 


Abbott, James H., M.S. (Southern Methodist Univ.), Graduate Student, University of 
Illinois, Box 64, University Station, Urbana, Illinois. 

Atkinson, Richard C., Ph. B. (Univ. of Chicago), Research Assistant, Psychology 
Department, Indiana University, Bloomington, Indiana. 

Austin, Thomas L., Jr., BBA (Univ. of Georgia) , Graduate Assistant, University of Georgia, 
College of Business Administration, Athens, Georgia, 195 Thirteenth Street, N.E., 
Atlanta, Georgia. 

Baade, William H., B.S. (Mass. College of Pharmacy), Chemist, Process Research and 
Development, Merck and Company, Inc., Rahway, New Jersey. 

Barnes, Gerald W., M.A. (Univ. of Arkansas), Teaching Fellow, Indiana University, De 
partment of Psychology, Bloomington, Indiana. 

Binford, J. R., A.B. (DePauw), Teaching Fellow, Indiana University, Psychology Depart- 
ment, Bloomington, Indiana. 

Blumenthal, Robert M., B.A. (Oberlin College), Graduate Student and Teaching Assistant, 
Department of Mathematics, White Hall, Cornell University, Ithaca, New York. 
Box, George E. P., Ph.D. (London), Statistician, Statistical Research Section, M.C.S.D., 

Imperial Chemical Industries, Dyestuffs Division, Blackley, Manchester 9, England 

Bramwell, W. K., Jr., B.S. (Arizona Univ.), Vice President, Hardin County Savings Bank, 
Eldora, Iowa. 

Brunelle, Robert H., B.A. (Univ. of the State of New York, Champlain College), Graduate 
Student, Institute of Statistics, University of North Carolina, Chapel Hill, North 
Carolina 

Buchbinder, Benjamin, B.A. (Brooklyn College), Graduate Student, Department of Sta 
tistics, University of North Carolina, Chapel Hill, North Carolina. 

Carterette, E. C. H., B.A. (Harvard Univ. Honors), Graduate Student and Research As 





818 NEWS AND NOTICES 


sistant, Hearing and Communication Laboratory, Department of Psychology, Indiana 
University, Bloomington, Indiana. 

Chu, John T., Ph.D. (Iowa State College), Research Associate, Department of Statistics, 
University of North Carolina, Chapel Hill, North Carolina, Box 182, Chapel H:ll, North 
Carolina. 

Cisin, Ira H., M.A. (American Univ.), Director of Research in Motivation, Morale and 
Leadership, Human Resources Research Office, George Washington University, 2013 
G Street N.W., Washington, D. C. 

Colton, Theodore, B.A. (Brooklyn College), Graduate Student, Institute of Statistics, 
University of North Carolina, Chapel Hill, North Carolina. 

Coons, Irma J., B.S. (Harding College), Graduate Student, Iowa State College, Ames, 
Iowa, Bevier House, Iowa State College, Ames, Iowa. 

Davis, Don Allen, B.A. (Western Washington College of Education), Teaching Assistant, 
University of Washington, Seattle 5, Washington, 374814 University Way, Seattle 5, 
Washington. 

Dillon, Thaddeus, III, M.S. (John Carroll Univ.), Instructor in Mathematics, John Carroll 
University, Cleveland 18, Ohio. 

Doan, Elizabeth, A.B. (Oberlin College), Graduate Student in Mathematical Statistics, 
University of North Carolina, Chapel Hill, North Carolina, Kenan Dormitory, Chapel 
Hill, North Carolina. 

Doto, Irene L., M.A. (Temple Univ.), Instructor, Temple University, Broad Street and 
Montgomery Avenue, Philadelphia 22, Pennsylvania, 5949 Nassau Road, Philadelphia 
31, Pennsylvania. 

Ecimovic, Juraj P., diploma (Faculty of Economics, Zagreb), Expert on Statistical Sampling 
in Agriculture of the Food and Agriculture Organization of the United Nations, Techni- 
cal Assistance Mission to Indonesia, Djakarta, Djalan, Hajam Wuruk 6, Indonesia. 

Ferguson, Barbara June, B.S. (Univ. of Washington), Graduate Student, Mathematics 
Department, University of Washington, Seattle, Washington, 2736 60th Avenue, S.W., 
Seattle 6, Washington. 

Frankmann, Raymond W., A.B. (Harvard Univ.), Graduate Student, Department of Psy- 
chology, Indiana University, Bloomington, Indiana. 

Free, Spencer M., Jr., Ph.D., (North Carolina State College), Research Statistician, Smith 
Kline & French Laboratories, 1530 Spring Garden Street, Philadelphia 1, Pennsylvania. 

Gart, John J., B.S. (DePaul Univ.), Graduate Assistant, Mathematics Department, Mar- 
quette University, Milwaukee 3, Wisconsin, 4925 N. Hamilton Avenue, Chicago 26, 
Illinois. 

Goen, Richard L., B.S. (Univ. of Washington), Graduate Student and Teaching Fellow, 
University of Washington, Seattle 5, Washington, 4407 Densmore, Seattle 3, Washington. 

Hudson, John B., B.A. (Univ. of Oregon), Graduate Student and Research Assistant, De 
partment of Sociology, University of Washington, Seattle 5, Washington. 

Imhof, Jean Pierre, Licence és Sciences Mathematiques (Univ. of Geneva, Switzerland), 
Graduate Student, University of California, Berkeley, California, 2310 Ellsworth Apt. 
1, Berkeley 4, California. 

Iqbal, M., M.A. (Panjab Univ., Lahore, Pakistan), Senior Lecturer in Statistics, Panjab 
University, Lahore, Pakistan, Statistics Department, University of North Carolina, Chapel 
Hill, North Carolina. 

Jerger, James F., M.A. (Northwestern Univ.), Research Audiologist, Northwestern Uni- 
versity, Evanston, Illinois, Hearing Clinic, Speech Annex Building, Northwestern 
University, Evanston, Illinois. 

Jones, Lawrence, B.S. (Denison Univ.), Graduate Student, Statistics Department, Iowa 
State College, Ames, Iowa, 2010 McCarthy Road, Ames, Iowa. 

Kelley, H. Paul, M.A. (Princeton Univ.), Psychometric Fellow, Educational Testing 
Service, 20 Nassau Street, Princeton, New Jersey. 
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Krakowski, Martin, Ph. D. (Carnegie Inst. of Tech.), Research Mathematician, Califor- 
nia Research Corporation, P. O. Box 446, LaHabra, California. 

Krueger, William D., B.B.A. (Western Reserve Univ.), Market Analyst, Harris-Seybold 
Company, 4510 East 71st Street, Cleveland 5, Ohio. 

Kud6, Hirokichi, Rigakushi (Tokyo Univ.), Assistant Professor of Mathematics, Ochano 
mizu University, Department of Mathematics, Ootsukamachi, Bunkyo-ku, Tokyo, 
Japan. 

Lancaster, Henry Oliver, Ph.D. (Univ. of Sydney), Lecturer in Medical Statistics, School 
of Public Health and Tropical Medicine, University Post Office, Sydney, Australia. 

Lehman, Eugene, H., Jr., M.A. (Teachers College Columbia Univ., Graduate Student in 
Mathematical Statistics and Research Assistant in Statistics, Highway Cost Allocation 
Project, Civil Engineering Department, University of Washington, Seattle 5, Wash 
ington, $408 Goldendale Place, Union Bay Villege, University of Washington, Seattle 6, 
Washington. 

Lemus, Ferdinand, B.A. (Michigan State Normal College), Graduate Student in statistics 
at lowa State College, Ames, Iowa. 

McKean, Harlley E., M.S. (Purdue Univ.), Research Assistant, Purdue University, Lafa 
yette, Indiana, 5-4 Ross-Ade Drive, West Lafayette, Indiana. 

Maggio, Ralph A., A.B. (Boston Univ.), Graduate Student, Applied Statistics, Rutgers 
University, New Brunswick, New Jersey, Apartment 42-A, Codrington Apartments, 
Bound Brook, New Jersey. 

Meshadani, Mahmoud H., M.S. (Iowa State College), Graduate Student, Iowa State Col 
lege, Ames, Iowa, 444/57 Jami Atta, Baghdad, Iraq. 

Miller, Clark T., M.A. (Univ. of Wisconsin), Graduate Student, Department of Mathe 
matics, University of Wisconsin, Madison 6, Wisconsin, 1553 Adams Street, Madison 6, 
Wisconsin. 

Moser, Joseph M., B.A. (St. John’s Univ.), Graduate Student and Assistant Instructor, 
St. Louis University, St. Louis 3, Missouri, 359 N. Whittier Street, St. Louis 8, Missouri. 

Na Nagara, Prasert, M.S. (Cornell Univ.), Graduate Student, Cornell University, Ithaca, 
New York, 107 Harvard Place, Ithaca, New York. 

Noel, Lionel M., B.S. (U.S. Naval Academy), Lieutenant, U.S. Navy, Washington, D. C., 
413-A Devereux Avenue, Princeton, New Jersey. 

Page, Ewan S., M.A.(Cantab.), Lecturer in Mathematical Statistics, 55 Staniey Drive, 
Leicester, England. 

Peterson, Henry H., B.S. (Univ. of Washington), Graduate Student and Teaching As 
sistant, University of Washington, Seattle, Washington, 8336 82nd Avenue, N.W., 
Seattle 7, Washington. 

Porter, Lee Burton, MBA (Univ. of Michigan), Actuarial Assistant, North Carolina Mutual 
Life Insurance Company, P. O. Box 201, Durham, North Carolina 

Richardson, Jack M.S. (Northwestern Univ.), Graduate Student, Psychology Department, 
Northwestern University, Evanston, Illinois. 

Salveson, Melvin E., Ph.D. (Univ. of Chicago), Procedures Research Manager, Major 
Appliance Division, General Electric Company, Appliance Park, Louisville 1, Ken 
tucky. 

Schwartz, Lorraine, B.A. (Univ. of California), Research Assistant, Statistical Labora- 
tory, University of California, Berkeley, California, 2208 Bancroft Way, Berkeley 4, 
California. 

Shapiro, Marvin, B.A. (Cornell Univ.), Graduate Student, Columbia University, New York 
27, New York, 3202 Cleveland Avenue, Niagara Falls, New York 

Sheldahl, Loren R., B.A. (lowa State Teachers College), Instructor, School of Public 
Health, University of Minnesota, Minneapolis 14, Minnesota. 

Soto-Alarcon, Jose L., B.S. (Univ. of Puerto Rico), Graduate Student, The Johns Hopkins 
University, 615 N. Wolfe St., Baltimore 5, Maryland 
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Strehler, Allen F., Ph.D. (Univ. of Wisconsin), Head, Department of Mathematics, The 
Lovelace Foundation for Medical Education and Research, 4800 Gibson Boulevard, 
8.E., Albuquerque, New Mexico 

Takano, Kinsaku, Rigakushi (Faculty of Science, Osaka Univ.), Member of the Institute 
of Statistical Mathematics, No. 203 Toei Apartments, 1548 Hatagaya-Nakamachi, Shi- 
buya-ku, Tokyo, Japan 

Tamargo, M. A., M.S. (Louisiana State Univ.), Head of the Grain & Root Crop Bureau 
(Acting Chief Statistics Department, Fiber Coop. Commission, Agricultural Experi- 
ment Station, Santiago Vegas, Cuba), Ministerio de Agricultura, Havana, Cuba, Linea 
656 altos ent. A y B, Vedado-Havana, Cuba. 

Taylor, Howard L., M.S. (lowa State College), Associate and Student, Statistical Labora- 
tory, Iowa State College, Ames, Iowa, 1323 Carroll Avenue, Ames, Iowa. 

Tucker, Thomas F., M.B.A. (Univ. of Pennsylvania, Wharton School), Assistant Actuary, 
Commonwealth of Pennsylvania, Insurance Department, 600 Packard Building, Phil- 
adelphia 2, Pennsylvania. 

Van Der Byl, Willem, Doctor (State Univ. of Utrecht, Netherlands), Research Associate 
KNMI, Koninklyk Nederlands Meteorologisch Instituut, KNMI, DeBilt, Netherlands. 

Van Natta, Pearl A. (Mrs. E.A.), M.A. (Univ. of Oregon), Research Assistant in Mathemati- 
cal Statistics, University of Oregon, Eugene, Oregon. 

Weber, Donald C., M.S. (Univ. of Wisconsin), Instructor in Mathematics, North Dakota 
Agricultural College, State College Station, Fargo, North Dakota, 414 S. Seventh 
Avenue, Wausau, Wisconsin. 

Weigel, E. Charles, B.S. (Rutgers Univ.), Accountant, Harry Rubenstein, C.P.A., 313 State 
Street, Perth Amboy, New Jersey, 423 Compton Avenue, Perth Amboy, New Jersey. 

Wertz, Frederick E., B.S. (Manhattan College), Graduate Student, Rutgers University, 
New Brunswick, New Jersey, 121 West 69th Street, New York 23, New York. 

White, Robert F., M.S. (Univ. of Connecticut), Graduate Assistant and Student, Iowa State 
College, Statistical Laboratory, Ames, Iowa. 

Witeck, Eugene F., B.S. (Marquette Univ.), Graduate Student, University of Washington, 
Seattle, Washington, 8335 13th Avenue, N.W., Seattle 7, Washington. 

Young, Gregory O., M.S. (California Inst. of Tech.), Research Engineer, Technical Staff, 
Hughes Aircraft Company, Department 04, Culver City, California. 

Zastrow, Marvin C., B.S. (Wisconsin State College), Graduate Assistant in mathematics, 
Marquette University, Milwaukee, Wisconsin, 1901 S. 24th Street, Milwaukee 4, Wis- 
consin 


a en 


REPORT OF THE MONTREAL MEETING OF THE INSTITUTE 


The sixteenth summer meeting, 62nd meeting of the Institute of Mathe- 
matical Statistics was held in Montreal, Canada, September 10-13, 1954. Other 
organizations meeting in Montreal at the same time were the American Statisti- 
cal Association, the Econometric Society, the American Society for Quality 
Control (Montreal Section), and the Biometric Society (Eastern North American 
Region). On the evening of September 10, the City of Montreal held a reception 
for the members of the societies, and on the evening of September 11, there was 
an informal party for the members of the societies. 

The following 199 members of the Institute registered for the meeting: 


Helen Abbey, R. L. Anderson, F. J. Anscombe, K. J. Arnold, J. C. Bain, J. D. Bankier, 
R. Bechhofer, C. A. Bennett, J. Berkson, C. A. Bicking, C. I. Bliss, I. Blumen, R. C. Bose, 
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A. H. Bowker, R. A. Bradley, A. E. Brandt, R. Brickley, 8. H. Brooks, R. W. Burgess, P. J 

Burke, I. W. Burr, L. D. Calvin, A. G. Carlton, D. G. Chapman, A. Charnes, H. Chernoff, 
C. W. Churchman, A. G. Clark, W. G. Cochran, A. C. Cohen, Jr., E. L. Cox, Gertrude M. 
Cox, J. H. Curtiss, C. Daniel, G. B. Dantzig, R. B. Dawson, Jr., B. B. Day, A. De la Garza, 
F. Del Priore, D. B. DeLury, W. E. Deming, L. Derrick, J. L. Dobby, T. Donnelly, J. E 

Dowd, J. R. Duffett, A. J. Dunean, D. B. Duncan, C. W. Dunnett, G. L. Edgett, C. Eisen- 
hart, B. Epstein, J. W. Fertig, D. Fraser, J. E. Freund, Kathryn Froelich, L. Gerende, R 

Goodman, C. H. Goulden, M. H. Gourary, F. Graybill, B. G. Greenberg, 8. W. Greenhouse, 
L. Gunlogson, P. Gunther, M. Gurney, I. Guttman, R. K. Haddad; R. J. Hader, K. W. 
Halbert, M. Halperin, J. F. Hannan, M. H. Hansen, G. W. Brier, G. M. Harrington, B. 
Harris, B. Harshbarger, H. L. Harter, L. H. Herbach, G. R. Herd, W. Hoeffding, R. G 

Hoffman, P. G. Homeyer, W. C. Hood, L. H. Hook, W. H. Horton, H. Hotelling, E. k 
Houseman, C. J. Hoyt, J. F. Hudson, J. 8. Hunter, P. Irick, J. E. Jackson, Carol M. Jaeger, 
H. L. Jones, M. Kac, L. Katz, Harriet J. Kelly, N. Keyfitz, A. W. Kimball, B. F. Kimball, 
E. P. King, C. J. Kirchen, L. Kish, K. H. Kramer, W. Kruskal, R. B. Ladd, R. O. Laine, 
T. A. Lamke, F. C. Leone, R. Lessard, G. J. Lieberman, R. Likert, B. Lipstein, 8. B. Lit 

tauer, G. F. Lunger, C. J. Maloney, J. Mandel, E.S8. Marks, R. H. Matthias, J. W. Mauchly, 
P. Meier, Margaret Merrell, W. J. Merrill, H. A. Meyer, R. Mirsky, 8. Monro, C. B. Moore, 
R. Moore, M. Morrison, N. Morse, J. Moshman, G. E. McCreary, Horace W. Norton, James 
A. Norton, Jr., G. E. Noether, F. G. Cornell, G. E. Nicholson, Jr., Ingram Olkin, E. G. 
Olds, Robert E. Odeh, W. R. Pabst, Jr., Boyd Z. Palmer, W. E. Patte, A. E. Paull, E. W. 
Pike, Howard Raiffa, Mrs. L. K. Randolph, J. 8. Rhodes, Robert Roeloffs, John H. Rose 

boom, H. M. Rosenblatt, Joan R. Rosenblatt, Murray Rosenblatt, Irving Roshwalb, 8. N. 
Roy, Jacques St. Pierre, Daniel E. Sands, A. E. Sarhan, F. E. Satterthwaite, Henry Scheffé, 
Marvin Schneiderman, H. L. Seal, Daniel Seigel, Richard H. Shaw, Loren R. Sheldahl, 
David Sheppard, Walt R. Simmons, H. F. Smith, Walter Smuk, Herbert Solomon, Paul N 

Somerville, F. F. Stephan, David 8. Stoller, Hale C. Sweeney, Nancy Symons, Donovan J 

Thompson, W. A. Thompson, Jr., William R. Thompson, Leo J. Tick, John W. Tukey, 
Charles R. M. Tuttle, G. W. Tyler, W. Vander Byl, D. F. Votaw, Jr., A. J. Wadman, John 
R. Walter, Lionel Weiss, Samuel Weiss, John 8. White, Alfred G. Whitney, William Wolman, 
Max A. Woodbury, R. Wormleighton, A. W. Wortham, W. J. Youden, Samuel Zahl, Marvin 
Zelen, George Zyskind. 


The Program was as follows: 


FRIDAY, SEPTEMBER 10, 1954 


10:00 a.m. Theory of Sampling Fish Populations (Cosponsored by A.S.A. 
and Biometric Society) 


Chairman: D. B. DeLury, Ontario Research Foundation 
Papers: 1. Combined Estimation Methods in Sampling Fish Populations, D. G. Chap 
man, University of Washington 
2. Some Admissible Tag-Recapture Procedures, D. 8. Robson, Cornell 
University 
Discussion: E. L. Cox, Case Institute of Technology 


3:00 p.m. The Joint Effects of Reading Errors and Grouping on Standard 
Methods of Statistical Inference. (Invited address). (Cosponsored 
by A.S.A. and Biometric Society) 


Chairman: R. L. Anderson, North Carolina State College 
Speaker: Churchill Eisenhart, National Bureau of Standards 





822 NEWS AND NOTICES 


4:00 p.m. Multiple Comparison and Multiple Decision Procedures. (Co- 
sponsored by A.S.A. and Biometric Society) 


Chairman: John W. Tukey, Princeton University 
Papers: 1. A Survey of Multiple Comparison Procedures, Jerome Cornfield, National 
Institute of Health 
2. A Survey of Multiple Decision Procedures, Robert Bechhofer, Cornell 
University 
Discussion: R. C. Bose, North Carolina State College and W. G. Cochran, Johns Hop 
kins University 


7:15 p.m. Council Meeting 


SATURDAY, SEPTEMBER 11, 1954 


10:30 a.m. Invited Addresses (Cosponsored by Econometric Society) 


Chairman Howard Raiffa, Columbia University 
Papers: 1. Estimation of the Components of Stochastic Structures, J. Wolfowitz, 
Cornell University 
2. Nonparametric Large Sample Theory, Wassily Hoeffding, University of 
North Carolina, Special invited address. 


2:00 p.m. Contributed Papers I 


Chairman: G. Lieberman, Stanford University 
Papers: 1. On Quadratic Estimates of Variance Components in Balanced Models, 
A. W. Wortham, Chance Vought Aircraft and Oklahoma A and M 
College 
. The coefficients in the Best Linear Estimate of the Mean in Symmetric 
Populations, A. KE. Sarhan, University of North Carolina 
. Distribution of Linear Contrasts of Order Statistics, Jacques St. Pierre, 
University of North Carolina 
. Note on Fourier Periodogram Analyses of Time Series, B. F. Kimball, 
New York State Public Service Commission 
5. Univariate Two-Population Distribution-Free Discrimination, David 
8. Stoller, Rand Corporation 
. New Types of Easily Constructed Partially Balanced Incomplete Block 
Designs, John Mandel and Marvin Zelen, National Bureau of Stnd- 
ards 
. The Stochastic Convergence of a Function of Sample Successive Dif- 
ferences, Lionel Weiss, University of Virginia 
3. On a Modified T* Problem, Ingram Olkin, Michigan State College and 
8. 8. Shrikhande, College of Science, Nagpur 
. The Validity of Sheppard’s Corrections for Grouping, F. J. Anscombe, 
University of Cambridge and Princeton University 
. Unbiased Tests Based on Unbiased Estimators, Reed B. Dawson, De 
partment of Defense 
. The Mean Square Error of the Sample Median, Harold Hotelling, Uni- 
versity of North Carolina 
2. The Moments of the Saniple Median, J. T. Chu and Harold Hotelling, 
University of North Carolina 
3. Distribution of the Largest Vote in Unstructured Random Balloting, Leo 
Katz, Michigan State College 
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. Statistical Programming, D. F. Votaw, Jr., Yale University 

5. Exact Tests of Significance for Combining Inter- and Intra-Block In 
formation in Incomplete Block Designs (Preliminary Report) (By title), 
Marvin Zelen, National Bureau of Standards 

. Moments and Related Quantities of the Null Distribution of Linear Con- 
trasts of Order Statistics in the Case of Three Populations (By title), 
Jacques St. Pierre, University of North Carolina 

. Application of Fada di Bruno’s Formula in Mathematical Statistics (By 
title), Eugene Lukacs, Office of Naval Research 

8, On Simultaneous Minimaz Point Estimation (By title), Waldo A. Vezeau 
and Koichi Ito, St. Louis University 

9. Estimation of Structural Parameters when the Number of Incidental 
Parameters is Unbounded (By title), J. Wolfowitz, Cornell University 

. On Power Properties of Certain Simultaneous Tests (By title), K. V. 
Ramachandran, University of North Carolina. 

. On Tests of Normality and Other Tests of Goodness of Fit Based on Dis 
tance Methods (By title), M. Kac, J. Kiefer, and J. Wolfowitz, Cornell 
University 

. Tolerance Regions (By title), D. A. 8. Fraser and I. Guttman, Uni 
versity of Torento. 


SUNDAY, SEPTEMBER 12, 1954 
12:00 noon. Council Meeting 
2:00 p.m. Contributed Papers II 


Chairman: Roger Lessard, Ecole Polytechnique de Montreal 
Papers: 1. Comparison of the Power of Nonparametric Two Sample Tests against 
Normal Alternatives, Benjamin Epstein, Wayne University 
2. On the Distribution of Radial Errors Having Normally Distributed Com 
ponents, A. C. Cohen, Jr., University of Georgia 
3. Confidence Bounds on Departures from a Particular Kind of Multi-Col- 
linearity of Means, 8. N. Roy, University of North Carolina 
4. The Efficiency of Tests, Wassily Hoeffding and Joan R. Rosenblatt, Uni 
versity of North Carolina 
5. On a Decision Procedure to Select the Population with the Largest Mean 
(Preliminary Report), R. C. Bose and Jacques St. Pierre, University 
of North Carolina 
6. Most Economical Multiple-Decision Rules (By title), William Jackson 
Hall, University of North Carolina 


4:00 p.m. Business Meeting 
MONDAY, SEPTEMBER 13, 1954 


8:30a.m. Computing Techniques (Cosponsored by Econometric Society) 


Chairman: George B. Dantzig, The Rand Coporation 
Papers: 1. Linear Programming on the 1.B.M. 701, Kurt Eisemann, International 
Business Machines Corp 
2. Recent Extensions and Revisions of the Simplex Method, William Orchard 
Hays, The Rand Corporation 
3. Separable Convex Functionals and Generalized Simplex Methods, C. 1. 
Lemke and A. Charnes, Carnegie Institute of Technolozy 
Discussion: Alan J. Hoffman, U.S. Department of Commerce 
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10:30 a.m. Biological Cycles (Cosponsored by A.S.A., Biometric Society, and 
Econometric Society) 


Chairman J. W. Hopkins, National Research Council (Canada) 
Papers: 1. Population Dynamics, D. A. MacLulich, Royal Canadain Air Force 
2. Statistical Problems and Techniques in Population Cycle Analysis, Mark 
Kac, Cornell University 
Discussion: D.B. DeLury, Ontario Research Foundation 


LIONEL WEISS 
Associate Secretary 


MINUTES OF MEMBERSHIP MEETING, SEPTEMBER 12, 1954 


A business meeting was called to order at 4:10 P.M., September 12, 1954 in 
the Sheraton Mount Royal Hotel, Montreal, P.Q. by President E. G. Olds. 
About 55 members were present. 

The Secretary read the minutes of the Annual Meeting of December 29, 1953.' 
The minutes were approved as read. 

The Secretary moved, on behalf of 43 members of the Institute, the adoption 
of the following amendment to the Bylaws: 


‘All meetings of the Institute shall be held on the basis of no racial segregation. In par- 
ticular, prior to determining the place of a forthcoming meeting the Secretary of the In- 
stitute shall ascertain that meeting halls, eating facilities, and housing accommodations 
adequate for the expected attendance shall be available on a nonsegregated basis, and that 
all social events connected with the meeting shall be nonsegregated.’’ 


The Secretary then moved, on behalf of 31 members of the Institute, the 
substitution of the following amendment for the one given immediately above: 


“It is the policy of the Institute of Mathematical Statistics that all its meetings shall 
be held on a completely nonsegregated basis. In particular, prior to determining the place 
of a forthcoming meeting, the Secretary of the Institute of Mathematical Statistics shall 
ascertain that meeting halls and eating facilities adequate for the expected attendance will 
be available on a nonsegregated basis and that all social events connected with the meetings 
shall be nonsegregated. Every effort shall be made to provide nonsegregated housing ac- 


’ 


commodations consistent with the laws of the locality of a forthcoming meeting.’ 


The President announced that members present would be asked to vote on 
the same ballot forms as used by those voting by mail and called for discussion 
of both questions. There was no discussion. Ballots were collected and the tellers 
retired to count the ballots. After adjournment of the meeting the results of the 
balloting were posted as follows: 575 ballots were cast. On the question of sub- 
stituting the second amendment above for the first, 126 voted ‘“‘yes,”’ 437 voted 
‘‘no,”’ 12 ballots contained no vote on the question. The next vote was therefore 
on the adoption of the first amendment above. On this question 311 voted “yes,” 


! Published in the Annals, Vol. 25 (1954) pp. 191-193. 
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236 voted “no,” 28 ballots contained no vote on the question. Lacking a two- 
thirds majority in favor, the motion lost. 

The Secretary-Treasurer gave an informal report. 

The Program Coordinator reported on the progress of plans for future 
meetings. 

The President reported discussion in the Council of several progress reports 
of committees of the Institute. Members were told that the Committee to Ex- 
plore the Desirability of Changing Time of Winter Meetings would welcome 
comments on whether or not they favored Christmas meetings and on other 
related questions. 

The appointment of 8. S. Wilks as representative of the Institute in the 
Division of Mathematics of the National Research Council for the term July 
1954 to June 1957 was announced. 

The election by the Council, on the nomination by a Committee to Nominate 
an Editor, of Erich L. Lehmann for the term 1956-1958 was announced. 

The election by the Council, on nomination by the Editor, of H. E. Daniels 
as an Associate Editor for a term beginning immediately and continuing until 
December 1955 was announced. 

The addition of T. W. Anderson, Jr., Z. W. Birnbaum, D. H. Blackwell, 
J. L. Doob, W. Feller, John Gurland, C. M. Stein, G. E. Nicholson, Milton 
Sobel, and Murray Rosenblatt to the Committee on a Summer Statistica] In- 
stitute and the resignations from this committee of Gertrude M. Cox and Herbert 
Solomon were announced. 

The addition of Arnold Court to the Committee to Reexamine the Constitu- 
tion and Bylaws was announced. 

The appointment of W. Feller as Rietz Lecturer for 1955 was announced. 

A. H. Bowker was called on to report informally on the progress of discussions 
within the Committee on Activities and Development. 


Boyd Harshbarger presented the following resolutions which were passed 
by the meeting. 


‘“‘Whereas Professor Roger Lessard of the Local Arrangements Committee has provided 
excellent arrangements for this meeting be it resolved that we the members of the Institute 
of Mathematical Statistics express our appreciation and thanks to him. The Secretary will 
record this resolution as a permanent part of the minutes of the meeting and a copy will be 
sent to Professor Lessard. 

‘Whereas Dr. Edwin G. Olds, President, Dr. K. J. Arnold, Secretary, and Dr. Lionel 
Weiss, Associate Secretary, have shown outstanding leadership, be it resolved that we the 
members of the Institute of Mathematical Statistics express appreciation and thanks to 
them. The Secretary will record the resolution as a permanent part of the minutes of the 
meeting and copies will be sent to Dr. Olds, Dr. Arnold and Dr. Weiss. 

‘‘Whereas Dr. D. B. DeLury and his associates have arranged a stimulating and creative 
program be it resolved that we the members of the Institute of Mathematical Statistics 
express appreciation and thanks to them. The Secretary will record this resolution as a 
permanent part of the minutes of the meeting and a copy will be sent to Dr. DeLury 

‘“‘Whereas the City of Montreal and its officials have extended a cordial welcome and have 
given a gracious and generous reception for the members azid guests, be it resolved that we 
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the members of the Institute of Mathematical Statistics express appreciation and thanks 
The Secretary will record the resolution as a permanent part of the minutes of this meet- 
ing.’’ 


The meeting adjourned at 5:15 P.M. 


K. J. ARNOLD 
Secretary 


ee 


PUBLICATIONS RECEIVED 


‘A System of National Accidents and Supporting Tables,’’ United Nations, 1953, $.50 

Anuario Estadistica de Espafia, Instituto Nacional de Estadistica, 1953, 999 pp. 50 pesatas 

Beaumont, Ross A. ano Ricuarp W. Batu, Introduction to Modern Algebra and Matrix 
Theory, Rinehart & Company, Inc., New York, 1954, $6.00 

CHevaLLey, CLaupe C., The Algebraic Theory of Spinors, Columbia University Press, New 
York, 1954, $3.75. 

“Concepts and Definitions of Capital Formation,’’ United Nations, 1953, $.25. 

SANDERSON, Frep H., Methods of Crop Forecasting, Harvard University Press, Cambridge, 
1954, $5.00 

“Statistics of National Income and Expenditure,’’ United Nations, 1953, $.60 

Studi De Economia E Statistica, Ser. 1, Vol. I1, 1953, Universita Di Catania Anno Accademico 
1951-52, 380 pp 


SO 


INSTITUTIONAL MEMBERS 


INTERNATIONAL BusINESS Macuines Corporation, New York 

lowa Stare Co._uece, STatisTicAL LABoraToRY, Ames, Iowa 

MicniGaAN Stare Co.tLece, DepartMENT or Maruematics, East Lansing, Michigan 

Princeton University, DePpARTMENT OF MATHEMATICS, SECTION OF MATHEMATICAL 
Sratistics, Princeton, New Jersey 

Purpure University Liprariges, Lafayette, Indiana 

RAYTHEON MANUFACTURING Company, Newton, Massachusetts 

Srare University oF Iowa, Iowa City, Iowa 

UNIVERSITY OF CALIFORNIA, SratisticaL Laporatory, Berkeley, California 

University or Inuinois, Urbana, Illinois 

JNiversity or Norra Carona, [nsrirure or Statistics, Chapel Hill, North Carolina 

INIVERSITY OF WASHINGTON, LABORATORY OF STATISTICAL RESEARCH, Seattle, Washington 





TRABAJOS DE ESTADISTICA 


Review published by “Instituto de Investigaciones Estadisticas’’ of the ‘‘Consejo 
Superior de Investagaciones Cientfficas.’’ Madrid, Spain. 


Vol. V CONTENTS Cuad, Il 


J. NEYMAN Sur une famille de tests asymptotiques des hypotheses statistique 

composées: 
P. Zoroa Superposicién de variables aleatorias y sus aplicaciones. 
8S. Vaspa ..A Problem of Encounters. 
NOTAS 


CuHarues A. Spoer. La ciencia actuarial: Una visiédn general de su desarrollo 
tedrico. 

J. Rovo y 8S. Ferrer Tabla de ndmeros aleatorios obtenida de los nimeros 
de la Loteria Nacional Espafiola. 

E. LuaGunez y J. L. TeENDERO Notas para la ensefianza de algunos conceptos 
elementales de Estadistica en el Bachillerato. 


Croénicas Bibliografia Cuestiones 


For everything in connection with works, exchanges and subscriptions write to Prof. 
Sixto Rios. Departamento de Estadistica del Consejo Superior de Investigaciones 
Cientificas, Serrano 123, Madrid, Spain. The Review is composed of three fascicles 
published quarterly (about 350 pages) and its price is 80 pts. for Spain and South- 
America and 3 American Dollars for all other countries. 
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Measures of Association for Multiple Polytomies 
Leo A, GOODMAN AND WILLIAM H. KrusKAL 
Unsolved Problems of Experimental Statistics Joun W. TuKEy 
Analysis of Simple Lattice Designs with Unequal Sets of Replications PauL MEIER 
The Use of Normal Probability Paper..HerRMAN CHERNOFF AND GERLAD J. LIEBERMAN 
On the Presentation of the Results of Sample Surveys as Legal Evidence 
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\ Test of Goodness of Fig T. W. ANDERSON AND D. A. DarLiIne 
Univariate Two-population Distribution-free Discrimination Davip S. STOLLER 
lhe Experimental Approach in the Teaching of Statistics pocnce sve és sen 1, Eee 
Use of Experiments in Teaching Engineering Statistics .. ++. IRVING W, Burr 
Cyclical Fluctuations in Foundry Activity. ‘ Lioyp SAVILLE 
Cargo Loss in Ferrying Operations .. WALTER L. DEEMER, JR 
Accuracy of Age Reporting in the 1950 United States Census. ... Ropert J. Myers 
Validation of Morbidity Survey Data by Comparison with Hospital Records 
NeprRA B. BELLoc 
STATISTICAL ABSTRACTS 


THE AMERICAN STATISTICAL ASSOCIATION INVITES 
AS MEMBERS ALL PERSONS INTERESTED IN: 

1. Development of new theory and method 

2. Improvement of basic statistical data 
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Journal of the Econometric Society 
Contents of Vol. 22, No. 4 - October, 1954 


RAGNAR Friscu: Some Basic Principles of Price of Living Measurements 888 
HERMAN CHERNOFF: Rational Selection of Decision Functions .. BBB 


J. A. C. Brown: The Consumption of Food in Relation to Household Composition and 
Income 888 


W. DUANE Evans: The Effect of Structural Matrix Errors on Inter-industry Relations 
Estimates BBB 


KENNETH J. ARRow: Import Substitution in Leontief Models 888 
Joun F. Mutu: A Note on Balanced Growth , ABB 
Daniev B. Suirs: Dynamic Growth under Diminishing Returns to Scale BBB 


Hans P. NEIsser: Some Comments on Balanced Growth under Constant Returns to 
Scale BBB 
RAGNAR Friscu: Linear Expenditure Functions....... BBB 


Report of the Washington Meeting... 8B 
Book Reviews, Notes, and Announcements 8BB 


Published Quarterly Subscription rates available on request 
The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics 
Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in applying 


for membership should be addressed to Rossen L. Cardwell, Secretary, The Econometric Society, The 
University of Chicago, Chicago 37, Illinois, U. 8. A 
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The Design of an Experiment in which Certain Treatment Arrangements Are Inadmissible. D. R. COX 
The Estimation of Location and Scale Parameters from Grouped Data. J. M. HAMMERSLEY and K. W 
MORTON. Transformations of the Binomial, Negative Binomial, Poisson and x*-Distributions. G, 
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Goodness of Fit of Frequency Distributions Obtained from Stochastic Processes. V. N. PATANKAR. 
On Stationary Processes in the Plane. P. WHITTLE. Rank Analysis of Incomplete Block Designs. II. 
Additional Tables for the Method of Paired Comparisons. R.A. BRADLEY 


The subscription price, payable in advance, is 45s. inland, 548. export (per volume including postage’) Cheques 
should be drawn to Biometrika and sent to “The Secretary, Biometrika Office, Department of Statistics, 
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Economy 
Joun E. WALsH Analytic Tests and Confidence Intervals for the Mean Value, 
Probabilities, and Percentage Points of a Poisson Distribution 
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